With this data integration class you can read sequences from
multi- FASTA,
-EMBL, -GenBank or
Swiss-Prot flat files.
You can import protein-, DNA or RNA or sequences with arbitary symbols
like
secondary structure symbols but most analyses like
molecular weight or
isoelectric point require amino acid sequences.
Please note that this class just extracts the sequence information and no
additional annotations. To analyse for example UniProt or
Swiss-Prot features
use the SwissProt input class instead.
Example files are located in
/example_data/flatfile, e.g. the positive.fastab> and negative.fasta
are taken from one early training set of the protein crystallizabilty project.
[Smialowski,P., Schmidt, T., Cox, J., Kirschner, A., Frishman, D. (2005).
Will my protein crystallize? A sequence-based predictor. Proteins: Structure,
Function, and Bioinformatics, in press. ]