Parent Directory
|
Revision Log
no message
<?xml version="1.0" encoding="utf-8" ?> <Database> <Title>Sprout Genome and Subsystem Database</Title> <Entities> <Entity name="Genome" keyType="name-string"> <Notes>A [i]genome[/i] contains the sequence data for a particular individual organism.</Notes> <Fields> <Field name="genus" type="name-string"> <Notes>Genus of the relevant organism.</Notes> <DataGen pass="1">RandParam('streptococcus', 'staphyloccocus', 'felis', 'homo', 'ficticio', 'strangera', 'escherischia', 'carborunda')</DataGen> </Field> <Field name="species" type="name-string"> <Notes>Species of the relevant organism.</Notes> <DataGen pass="1">StringGen('PKVKVKVKVKV')</DataGen> </Field> <Field name="unique-characterization" type="medium-string"> <Notes>The unique characterization identifies the particular organism instance from which the genome is taken. It is possible to have in the database more than one genome for a particular species, and every individual organism has variations in its DNA.</Notes> <DataGen>StringGen('PKVKVK999')</DataGen> </Field> <Field name="access-code" type="key-string"> <Notes>The access code determines which users can look at the data relating to this genome. Each user is associated with a set of access codes. In order to view a genome, one of the user's access codes must match this value.</Notes> <DataGen>RandParam('low','medium','high')</DataGen> </Field> <Field name="taxonomy" type="text"> <Notes>The taxonomy string contains the full taxonomy of the organism, while individual elements separated by semi-colons (and optional white space), starting with the domain and ending with the disambiguated genus and species (which is the organism's scientific name plus an identifying string).</Notes> <DataGen pass="2">join('; ', (RandParam('bacteria', 'archaea', 'eukaryote', 'virus', 'environmental'), ListGen('PKVKVKVK', 5), $this->{genus}, $this->{species}))</DataGen> </Field> <Field name="group-name" type="name-string" relation="GenomeGroups"> <Notes>The group identifies a special grouping of organisms that would be displayed on a particular page or of particular interest to a research group or web site. A single genome can belong to multiple such groups or none at all.</Notes> </Field> </Fields> <Indexes> <Index> <Notes>This index allows the applications to find all genomes associated with a specific access code, so that a complete list of the genomes users can view may be generated.</Notes> <IndexFields> <IndexField name="access-code" order="ascending" /> <IndexField name="genus" order="ascending" /> <IndexField name="species" order="ascending" /> <IndexField name="unique-characterization" order="ascending" /> </IndexFields> </Index> <Index Unique="false"> <Notes>This index allows the applications to find all genomes for a particular species.</Notes> <IndexFields> <IndexField name="genus" order="ascending" /> <IndexField name="species" order="ascending" /> <IndexField name="unique-characterization" order="ascending" /> </IndexFields> </Index> </Indexes> </Entity> <Entity name="Source" keyType="medium-string"> <Notes>A [i]source[/i] describes a place from which genome data was taken. This can be an organization or a paper citation.</Notes> <Fields> <Field name="URL" type="string" relation="SourceURL"> <Notes>URL the paper cited or of the organization's web site. This field optional.</Notes> <DataGen>"http://www.conservativecat.com/Ferdy/TestTarget.php?Source=" . $this->{id}</DataGen> </Field> <Field name="description" type="text"> <Notes>Description the source. The description can be a street address or a citation.</Notes> <DataGen>$this->{id} . ': ' . StringGen(IntGen(50,200))</DataGen> </Field> </Fields> </Entity> <Entity name="Contig" keyType="name-string"> <Notes>A [i]contig[/i] is a contiguous run of residues. The contig's ID consists of the genome ID followed by a name that identifies which contig this is for the parent genome. As is the case with all keys in this database, the individual components are separated by a period. [p]A contig can contain over a million residues. For performance reasons, therefore, the contig is split into multiple pieces called [i]sequences[/i]. The sequences contain the characters that represent the residues as well as data on the quality of the residue identification.</Notes> </Entity> <Entity name="Sequence" keyType="name-string"> <Notes>A [i]sequence[/i] is a continuous piece of a [i]contig[/i]. Contigs are split into sequences so that we don't have to have the entire contig in memory when we are manipulating it. The key of the sequence is the contig ID followed by the index of the begin point.</Notes> <Fields> <Field name="sequence" type="text"> <Notes>String consisting of the residues. Each residue is described by a single character in the string.</Notes> <DataGen>RandChars("ACGT", IntGen(100,400))</DataGen> </Field> <Field name="quality-vector" type="text"> <Notes>String describing the quality data for each . Individual values will be separated by periods. The value represents negative exponent of the probability of error. Thus, for example, a quality of 30 indicates the probability of error is 10^-30. A higher quality number a better chance of a correct match. It is possible that the quality data is known for a sequence. If that is the case, the quality vector will contain the [b]unknown[/b].</Notes> <DataGen>unknown</DataGen> </Field> </Fields> </Entity> <Entity name="Feature" keyType="name-string"> <Notes>A [i]feature[/i] is a part of a genome that is of special interest. Features may be spread across multiple contigs of a genome, but never across more than one genome. Features can be assigned to roles via spreadsheet cells, and are the targets of annotation.</Notes> <Fields> <Field name="feature-type" type="string"> <Notes>Code indicating the type of this feature.</Notes> <DataGen>RandParam('peg','rna')</DataGen> </Field> <Field name="alias" type="name-string" relation="FeatureAlias"> <Notes>Alternative name for this feature. feature can have many aliases.</Notes> <DataGen testCount="3">StringGen('Pgi|99999', 'Puni|XXXXXX', 'PAAAAAA999')</DataGen> </Field> <Field name="translation" type="text" relation="FeatureTranslation"> <Notes>[i](optional)[/i] A of this feature's residues into character codes, formed by concatenating the pieces of the feature together.</Notes> <DataGen testCount="0"></DataGen> </Field> <Field name="upstream-sequence" type="text" relation="FeatureUpstream"> <Notes>Upstream sequence the feature. This includes residues preceding the feature as well as some of the feature's initial residues.</Notes> <DataGen testCount="0"></DataGen> </Field> <Field name="active" type="boolean"> <Notes>TRUE if this feature is still considered valid, if it has been logically deleted.</Notes> <DataGen>1</DataGen> </Field> <Field name="link" type="text" relation="FeatureLink"> <Notes>Web hyperlink for this feature. A feature have no hyperlinks or it can have many. The links are to other websites that have useful about the gene that the feature represents, and are coded as raw HTML, using [b]<a href="[i]link[/i]">[i]text[/i]</a>[/b] notation.</Notes> <DataGen testCount="3">'http://www.conservativecat.com/Ferdy/TestTarget.php?Source=' . $this->{id} . "&Number=" . IntGen(1,99)</DataGen> </Field> </Fields> </Entity> <Entity name="Role" keyType="string"> <Notes>A [i]role[/i] describes a biological function that may be fulfilled by a feature. One of the main goals of the database is to record the roles of the various features.</Notes> <Fields> <Field name="name" type="string" relation="RoleName"> <Notes>Expanded name of the role. This value is generally only available for roles that are encoded as EC numbers.</Notes> <DataGen testCount="1">StringGen(IntGen(20,40)) . "(" . $this->{id} . ")"</DataGen> </Field> </Fields> </Entity> <Entity name="Annotation" keyType="name-string"> <Notes>An [i]annotation[/i] contains supplementary information about a feature. Annotations are currently the only objects that may be inserted directly into the database. All other information is loaded from data exported by the SEED. [p]Each annotation is associated with a target [b]Feature[/b]. The key of the annotation is the target feature ID followed by a timestamp.</Notes> <Fields> <Field name="time" type="date"> <Notes>Date and time of the annotation.</Notes> </Field> <Field name="annotation" type="text"> <Notes>Text of the annotation.</Notes> </Field> </Fields> </Entity> <Entity name="Subsystem" keyType="string"> <Notes>A [i]subsystem[/i] is a collection of roles that work together in a cell. Identification of subsystems is an important tool for recognizing parallel genetic features in different organisms.</Notes> </Entity> <Entity name="SSCell" keyType="name-string"> <Notes>Part of the process of locating and assigning features is creating a spreadsheet of genomes and roles to which features are assigned. A [i]spreadsheet cell[/i] represents one of the positions on the spreadsheet.</Notes> </Entity> <Entity name="SproutUser" keyType="name-string"> <Notes>A [i]user[/i] is a person who can make annotations and view data in the database. The user object is keyed on the user's login name.</Notes> <Fields> <Field name="description" type="string"> <Notes>Full name or description of this user.</Notes> </Field> <Field name="access-code" type="key-string" relation="UserAccess"> <Notes>Access code possessed by this user. A user can have many access codes; a genome is accessible to the user if its access code matches any one of the user's access codes.</Notes> <DataGen testCount="2">RandParam('low', 'medium', 'high')</DataGen> </Field> </Fields> </Entity> <Entity name="Property" keyType="int"> <Notes>A [i]property[/i] is a type of assertion that could be made about the properties of a particular feature. Each property instance is a key/value pair and can be associated with many different features. Conversely, a feature can be associated with many key/value pairs, even some that notionally contradict each other. For example, there can be evidence that a feature is essential to the organism's survival and evidence that it is superfluous.</Notes> <Fields> <Field name="property-name" type="name-string"> <Notes>Name of this property.</Notes> </Field> <Field name="property-value" type="string"> <Notes>Value associated with this property. For each property name, there must by a property record for all of its possible values.</Notes> </Field> </Fields> <Indexes> <Index> <Notes>This index enables the application to find all values for a specified property name, or any given name/value pair.</Notes> <IndexFields> <IndexField name="property-name" order="ascending" /> <IndexField name="property-value" order="ascending" /> </IndexFields> </Index> </Indexes> </Entity> <Entity name="Diagram" keyType="name-string"> <Notes>A functional diagram describes the chemical reactions, often comprising a single subsystem. A diagram is identified by a short name and contains a longer descriptive name. The actual diagram shows which functional roles guide the reactions along with the inputs and outputs; the database, however, only indicate which roles belong to a particular map.</Notes> <Fields> <Field name="name" type="text"> <Notes>Descriptive name of this diagram.</Notes> </Field> </Fields> </Entity> <Entity name="ExternalAliasOrg" keyType="name-string"> <Notes>An external alias is a feature name for a functional assignment that is not a FIG ID. Functional assignments for external aliases are kept in a separate section of the database. This table contains a description of the relevant organism for an external alias functional assignment.</Notes> <Fields> <Field name="org" type="text"> <Notes>Descriptive name of the target organism for this external alias.</Notes> </Field> </Fields> </Entity> <Entity name="ExternalAliasFunc" keyType="name-string"> <Notes>An external alias is a feature name for a functional assignment that is not a FIG ID. Functional assignments for external aliases are kept in a separate section of the database. This table contains the functional role for the external alias functional assignment.</Notes> <Fields> <Field name="func" type="text"> <Notes>Functional role for this external alias.</Notes> </Field> </Fields> </Entity> </Entities> <Relationships> <Relationship name="HasContig" from="Genome" to="Contig" arity="1M"> <Notes>This relationship connects a genome to the contigs that contain the actual genetic information.</Notes> </Relationship> <Relationship name="ComesFrom" from="Genome" to="Source" arity="MM"> <Notes>This relationship connects a genome to the sources that mapped it. A genome can come from a single source or from a cooperation among multiple sources.</Notes> </Relationship> <Relationship name="IsMadeUpOf" from="Contig" to="Sequence" arity="1M"> <Notes>A contig is stored in the database as an ordered set of sequences. By splitting the contig into sequences, we get a performance boost from only needing to keep small portions of a contig in memory at any one time. This relationship connects the contig to its constituent sequences.</Notes> <Fields> <Field name="len" type="int"> <Notes>Length of the sequence.</Notes> </Field> <Field name="start-position" type="int"> <Notes>Index (1-based) of the point in the contig where this sequence starts.</Notes> </Field> </Fields> <FromIndex> <Notes>This index enables the application to find all of the sequences in a contig in order, and makes it easier to find a particular residue section.</Notes> <IndexFields> <IndexField name="start-position" order="ascending" /> <IndexField name="len" order="ascending" /> </IndexFields> </FromIndex> </Relationship> <Relationship name="IsTargetOfAnnotation" from="Feature" to="Annotation" arity="1M"> <Notes>This relationship connects a feature to its annotations.</Notes> </Relationship> <Relationship name="MadeAnnotation" from="SproutUser" to="Annotation" arity="1M"> <Notes>This relationship connects an annotation to the user who made it.</Notes> </Relationship> <Relationship name="ParticipatesIn" from="Genome" to="Subsystem" arity="MM"> <Notes>This relationship connects subsystems to the genomes that use it. If the subsystem has been curated for the genome, then the subsystem's roles will also be connected to the genome features through the [b]SSCell[/b] object.</Notes> </Relationship> <Relationship name="OccursInSubsystem" from="Role" to="Subsystem" arity="MM"> <Notes>This relationship connects roles to the subsystems that implement them. </Notes> </Relationship> <Relationship name="IsGenomeOf" from="Genome" to="SSCell" arity="1M"> <Notes>This relationship connects a subsystem's spreadsheet cell to the genome for the spreadsheet column.</Notes> </Relationship> <Relationship name="IsRoleOf" from="Role" to="SSCell" arity="1M"> <Notes>This relationship connects a subsystem's spreadsheet cell to the role for the spreadsheet row.</Notes> </Relationship> <Relationship name="ContainsFeature" from="SSCell" to="Feature" arity="MM"> <Notes>This relationship connects a subsystem's spreadsheet cell to the features assigned to it.</Notes> </Relationship> <Relationship name="IsLocatedIn" from="Feature" to="Contig" arity="MM"> <Notes>This relationship connects a feature to the contig segments that work together to effect it. The segments are numbered sequentially starting from 1. The database is required to place an upper limit on the length of each segment. If a segment is longer than the maximum, it can be broken into smaller bits. [p]The upper limit enables applications to locate all features that contain a specific residue. For example, if the upper limit is 100 and we are looking for a feature that contains residue 234 of contig [b]ABC[/b], we can look for features with a begin point between 135 and 333. The results can then be filtered by direction and length of the segment.</Notes> <Fields> <Field name="locN" type="int"> <Notes>Sequence number of this segment.</Notes> </Field> <Field name="beg" type="int"> <Notes>Index (1-based) of the first residue in the contig that belongs to the segment.</Notes> </Field> <Field name="len" type="int"> <Notes>Number of residues in the segment. A length of 0 identifies a specific point between residues. This is the point before the residue if the direction is forward and the point after the residue if the direction is backward.</Notes> </Field> <Field name="dir" type="char"> <Notes>Direction of the segment: [b]+[/b] if it is forward and [b]-[/b] if it is backward.</Notes> </Field> </Fields> <FromIndex Unique="false"> <Notes>This index allows the application to find all the segments of a feature in the proper order.</Notes> <IndexFields> <IndexField name="locN" order="ascending" /> </IndexFields> </FromIndex> <ToIndex> <Notes>This index is the one used by applications to find all the feature segments that contain a specific residue.</Notes> <IndexFields> <IndexField name="beg" order="ascending" /> </IndexFields> </ToIndex> </Relationship> <Relationship name="IsClusteredOnChromosomeWith" from="Feature" to="Feature" arity="MM"> <Notes>This relationship is one of two that relate features to each other. It connects features that are physically close to each other on a single chromosome.</Notes> <Fields> <Field name="score" type="int"> <Notes>The number of co-occurrences in genomes that are not extremely closely-related.</Notes> </Field> </Fields> </Relationship> <Relationship name="IsBidirectionalBestHitOf" from="Feature" to="Feature" arity="MM"> <Notes>This relationship is one of two that relate features to each other. It connects features that are very similar but on separate genomes. A bidirectional best hit relationship exists between two features [b]A[/b] and [b]B[/b] if [b]A[/b] is the best match for [b]B[/b] on [b]A[/b]'s genome and [b]B[/b] is the best match for [b]A[/b] on [b]B[/b]'s genome. </Notes> <Fields> <Field name="genome" type="name-string"> <Notes>ID of the genome containing the target (to) feature.</Notes> </Field> <Field name="sc" type="float"> <Notes>score for this relationship</Notes> </Field> </Fields> <FromIndex> <Notes>This index allows the application to find a feature's best hit for a specific target genome.</Notes> <IndexFields> <IndexField name="genome" order="ascending" /> </IndexFields> </FromIndex> </Relationship> <Relationship name="HasProperty" from="Feature" to="Property" arity="MM"> <Notes>This relationship connects a feature to its known property values. The relationship contains text data that indicates the paper or organization that discovered evidence that the feature possesses the property. So, for example, if two papers presented evidence that a feature is essential, there would be an instance of this relationship for both.</Notes> <Fields> <Field name="evidence" type="text"> <Notes>URL or citation of the paper or institution that reported evidence of the relevant feature possessing the specified property value.</Notes> </Field> </Fields> </Relationship> <Relationship name="RoleOccursIn" from="Role" to="Diagram" arity="MM"> <Notes>This relationship connects a role to the diagrams on which it appears. A role frequently identifies an enzyme, and can appear in many diagrams. A diagram generally contains many different roles.</Notes> </Relationship> <Relationship name="HasSSCell" from="Subsystem" to="SSCell" arity="1M"> <Notes>This relationship connects a subsystem to the spreadsheet cells used to analyze and display it. The cells themselves can be thought of as a grid with Roles on one axis and Genomes on the other. The various features of the subsystem are then assigned to the cells.</Notes> </Relationship> <Relationship name="IsTrustedBy" from="SproutUser" to="SproutUser" arity="MM"> <Notes>This relationship identifies the users trusted by each particular user. When viewing functional assignments, the assignment displayed is the most recent one by a user trusted by the current user. The current user implicitly trusts himself. If no trusted users are specified in the database, the user also implicitly trusts the user [b]FIG[/b].</Notes> </Relationship> </Relationships> </Database>
MCS Webmaster | ViewVC Help |
Powered by ViewVC 1.0.3 |