Sample ERDB Database This XML file defines a small genetic database that demonstrates the features of the ERDB database engine. A Genome contains the sequence data for a particular individual organism. Scientific name of this genome, usually consisting of the genus, species, and unique characterization. Version string for this genome, generally consisting of the genome ID followed by a period and a string of digits. Number of [[protein encoding genes]] for this organism Number of RNA features found for this organism. Indication of this organism's behavior relating to environmental oxygen. Y/N/? flag indicating whether or not this organism is pathogenic. This index allows the applications to search for genome by scientific name. A contig is a contiguous run of base pairs. The contig's ID consists of the genome ID followed by a name that identifies which contig this is for the parent genome. String consisting of the base pairs. A feature (sometimes also called a "gene") is a part of a genome that is of special interest. Features may be spread across multiple contigs of a genome, but never across more than one genome. Features can be assigned to roles via spreadsheet cells, and are the targets of annotation. Each feature in the database has a unique FIG ID. Code indicating the type of this feature. Among the codes currently supported are "peg" for a protein encoding gene, "bs" for a binding site, "opr" for an operon, and so forth. (optional) A translation of this feature's residues into protein character codes, formed by concatenating the pieces of the feature together. Only protein encoding genes have translations. Default functional assignment for this feature. Name of the user who made the functional assignment Quality of the functional assignment, usually a space, but may be W (indicating weak) or X (indicating experimental) This relationship connects a genome to all of its features. This relationship is redundant in a sense, because the genome ID is part of the feature ID; however, it makes the creation of certain queries more convenient because you can drag in filtering information for a feature's genome. Feature type (eg. peg, rna) This index enables the application to view the features of a Genome sorted by type. This relationship connects a genome to the contigs that contain the actual genetic information. This relationship connects a feature to the contig segments that work together to effect it. The segments are numbered sequentially starting from 1. The database is required to place an upper limit on the length of each segment. If a segment is longer than the maximum, it can be broken into smaller bits. The upper limit enables applications to locate all features that contain a specific residue. For example, if the upper limit is 100 and we are looking for a feature that contains residue 234 of contig *ABC*, we can look for features with a begin point between 135 and 333. The results can then be filtered by direction and length of the segment. Sequence number of this segment. Index (1-based) of the first residue in the contig that belongs to the segment. Number of residues in the segment. A length of 0 identifies a specific point between residues. This is the point before the residue if the direction is forward and the point after the residue if the direction is backward. Direction of the segment: + if it is forward and - if it is backward. This index allows the application to find all the segments of a feature in the proper order. This index is the one used by applications to find all the feature segments that contain a specific residue.