This data adapter can read XML files that contain any numeric or symbolic data entries of one protein set. Additionally to these annotations it is possible define the protein set independently how many annotations are available in the XML file. The protein set definition can consist of a raw listing of the protein identifier or can contain the sequence information too. The optional protein set definition must be in property of the type setdef.

With this input feature you can import any kind of information without considering the semantic of the data.
This input objects can then further analyzed with the GenericXML compare engine.
A DTD and XSD Definition file of the format of the GenericXML files can be found in the /res/generic_xml_defs folder of this installation or alternativly within the jar file.

Basically a Generic XML  input files look like:

<dataset label="Escherichia_coli_k12">
    <property type="setdef" id="setdef">
         <input id="gi_123" value="ACCCVMAD" />  OR just <input id="gi_123"/>
     <property type="numeric" id="orf.length">
            <input id="gi_123" value="66" />
            <input id="gi_234" value="2463" />
    <property type="symbolic" id="funcat.fun_num">
            <input id="gi_123" value="01.01.01" />
            <input id="gi_234" value="01.01.04" />

Symbolic properties just describe any kind of Strings. Numeric properties just describe any numeric values.
If a protein has multiple annotation features you can either build multiple input nodes, but it is recommended to delimiter the annotations with a semicolon. For example the protein that has multiple functional annotations may be noted like:
<input id="xxx" value="membrane;isomerase;chaperon"/>

Example files are located in /example_data/genericxml/, e.g. the EcoliK12.xml and the EcoliK12_subset.xml files. The subset file contains all proteins that have multiple membrane segments predicted. Bot files contain SCOP classes, folds and functional FunCat features.