The metadata structures describe the entities and relationships implemented in the database. They are, in fact a database describing the database itself.
An entity is a real or abstract thing on which we wish to keep data. The terms entity and object are mostly interchangeable; however, for our purposes, object will only be used to describe an entity instance, rather than an entity type. In the relations that implement an entity, there must be an ID field that contains the entity key.
|entity-id||(key) displayable common name of the entity|
|relation-id||(multiple) a relation used to implement the entity|
A relationship is a connection between a pair of entities.
|relationship-id||(key) displayable common name of the relationship|
|relation-id||relation used to implement the relationship|
|arity||type of relationship: 1-to-many, many-to-many, many-to-1, or 1-to-1|
|source-entity-id||name of the entity type from which the relationship starts|
|target-entity-id||name of the entity type into which the relationship ends|
A relation is a physical table that implements a relationship or partly implements an entity.
|name||(key) name of the physical relation|
A field is a physical table column that ultimately contains the actual data.
|relation-id||(key.1) ID of the relation containing this field|
|name||(key.2) name of the field|
|data-type||type of data stored in the field|
The following methods are provided to access data in the database. Methods that allow iteration will have GetFirst and GetNext versions. For example, the GetObjects operation will be implemented as two methods-- GetFirstObject and GetNextObject.
The contig-id is the genome-id and the contig name. A CONTIG is a contiguous section of a genome that was produced by a sequencing project. The CONTIGs are named and generated externally and then loaded into the database.
The sequence id is the contig-id and the begin point. The sequence is an ordered collection of characters from an alphabet. For each character in the sequence, the quality vector is an integer exponent indicating the likelihood of an error. So, a quality value of 30 means the likelihood that the chqaracter is correct is (1 - 10^-30).
The character data for the CONTIG is broken into SEQUENCEs so that we do not have to manipulate the entire CONTIG as a string in memory. This is important, because some CONTIGs can be hundreds of megacharacters in length.
[feature-id,type] [feature-id,alias] [feature-id,DNA-sequence] [feature-id,translation] [feature-id,upstream-sequence] [feature-id,virulence] [feature-id,essentiality]
A single GENOME is composed of multiple CONTIGs.
A single GENOME can come from a single SOURCE or from cooperation by multiple SOURCEs. Multiple GENOMEs may come from a single SOURCE.
A single CONTIG is made up of multiple SEQUENCEs.
|start-position||ordinal number of this sequence in the CONTIG (For example, a start-position of 100 means that this sequence starts at the 100th position of the CONTIG.|
Multiple ANNOTATIONs can be made on a single FEATURE.
Multiple ANNOTATIONs can be made by a single USER.
Multiple ASSIGNMENTs can be made by a single USER
Multiple ASSIGNMENTs can be made to a single FEATURE.
Multiple ASSIGNMENTs can describe a single ROLE. Multiple ROLEs can be implemented by a single ASSIGNMENT.
Multiple GENOMEs can participate in multiple SUBSYSTEMs.
|variant||description of the subsystem variant|
Multiple ROLEs can be acheived by multiple SUBSYSTEMs.
Multiple SSCELLs belong to a single GENOME.
Multiple SSCELLs relate to a single ROLE.
A single FEATURE is located in multiple CONTIGs; a CONTIG contains multiple FEATURE locations. This relationship enables us to find the gene sequences in the CONTIGs that make up the FEATURE.
In order to insure that we are able to find all genes relating to a particular location we imposed a maximum size on each span encoded by this relationship. So, for example, if the maximum span size is 100 and we want to find all features that include position 321 of CONTIG ABC, we would search for location data relating to positions 222 through 420, and only emit them if the length and direction cross the 321 location.
|locN||ordinal number of this location for the FEATURE|
|beg||position of this location's first nucleotide in the CONTIG|
|len||number of nucleotides used by this location in the CONTIG|
|dir||direction of the location from the beginning point CONTIG|
A single SSCELL contains multiple FEATUREs; a FEATURE may be contained in multiple SSCELLs.
Multiple FEATUREs are related to multiple other FEATUREs. This relationship is commutative.
|score||measurement of the level of the relationship|
|type||type of relationship (similarity, bidirectional best hit, or chromosome clustering)|
Multiple FUSIONs produce a single FEATURE.