[Bio] / Sprout / SproutDBD.xml Repository:
ViewVC logotype

View of /Sprout/SproutDBD.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.4 - (download) (as text) (annotate)
Wed Jan 26 22:26:09 2005 UTC (14 years, 10 months ago) by parrello
Branch: MAIN
Changes since 1.3: +1 -1 lines
no message

<?xml version="1.0" encoding="utf-8" ?>
<Database>
    <Title>Sprout Genome and Subsystem Database</Title>
    <Entities>
        <Entity name="Genome" keyType="name-string">
            <Notes>A [i]genome[/i] contains the sequence data for a particular individual organism.</Notes>
            <Fields>
                <Field name="genus" type="name-string">
                    <Notes>Genus of the relevant organism.</Notes>
                    <DataGen pass="1">RandParam('streptococcus', 'staphyloccocus', 'felis', 'homo', 'ficticio', 'strangera', 'escherischia', 'carborunda')</DataGen>
                </Field>
                <Field name="species" type="name-string">
		    		<Notes>Species of the relevant organism.</Notes>
                    <DataGen pass="1">StringGen('PKVKVKVKVKV')</DataGen>
				</Field>
                <Field name="unique-characterization" type="medium-string">
                	<Notes>The unique characterization identifies the particular organism instance from which the
	                genome is taken. It is possible to have in the database more than one genome for a
                    particular species, and every individual organism has variations in its DNA.</Notes>
					<DataGen>StringGen('PKVKVK999')</DataGen>
                </Field>
                <Field name="access-code" type="key-string">
                	<Notes>The access code determines which users can look at the data relating to this genome.
                	Each user is associated with a set of access codes. In order to view a genome, one of
                	the user's access codes must match this value.</Notes>
					<DataGen>RandParam('low','medium','high')</DataGen>
                </Field>
				<Field name="taxonomy" type="text">
					<Notes>The taxonomy string contains the full taxonomy of the organism, while individual elements
					separated by semi-colons (and optional white space), starting with the domain and ending with
					the disambiguated genus and species (which is the organism's scientific name plus an
					identifying string).</Notes>
					<DataGen pass="2">join('; ', (RandParam('bacteria', 'archaea', 'eukaryote', 'virus', 'environmental'),
												  ListGen('PKVKVKVK', 5), $this->{genus}, $this->{species}))</DataGen>
				</Field>
				<Field name="group-name" type="name-string" relation="GenomeGroups">
					<Notes>The group identifies a special grouping of organisms that would be displayed on a particular
					page or of particular interest to a research group or web site. A single genome can belong to multiple
					such groups or none at all.</Notes>
				</Field>
            </Fields>
            <Indexes>
                <Index>
                    <Notes>This index allows the applications to find all genomes associated with
                    a specific access code, so that a complete list of the genomes users can view
                    may be generated.</Notes>
                    <IndexFields>
                        <IndexField name="access-code" order="ascending" />
                        <IndexField name="genus" order="ascending" />
                        <IndexField name="species" order="ascending" />
                        <IndexField name="unique-characterization" order="ascending" />
                    </IndexFields>
                </Index>
                <Index Unique="false">
                    <Notes>This index allows the applications to find all genomes for a particular
                    species.</Notes>
                    <IndexFields>
                        <IndexField name="genus" order="ascending" />
                        <IndexField name="species" order="ascending" />
                        <IndexField name="unique-characterization" order="ascending" />
                    </IndexFields>
                </Index>
            </Indexes>
        </Entity>
        <Entity name="Source" keyType="medium-string">
            <Notes>A [i]source[/i] describes a place from which genome data was taken. This can be an organization
            or a paper citation.</Notes>
            <Fields>
                <Field name="URL" type="string" relation="SourceURL">
					<Notes>URL the paper cited or of the organization's web site. This field optional.</Notes>
					<DataGen>"http://www.conservativecat.com/Ferdy/TestTarget.php?Source=" . $this->{id}</DataGen>
				</Field>
                <Field name="description" type="text">
					<Notes>Description the source. The description can be a street address or a citation.</Notes>
					<DataGen>$this->{id} . ': ' . StringGen(IntGen(50,200))</DataGen>
				</Field>
            </Fields>
        </Entity>
        <Entity name="Contig" keyType="name-string">
            <Notes>A [i]contig[/i] is a contiguous run of residues. The contig's ID consists of the
            genome ID followed by a name that identifies which contig this is for the parent genome. As
            is the case with all keys in this database, the individual components are separated by a
            period.
            [p]A contig can contain over a million residues. For performance reasons, therefore,
            the contig is split into multiple pieces called [i]sequences[/i]. The sequences
            contain the characters that represent the residues as well as data on the quality of
            the residue identification.</Notes>
        </Entity>
        <Entity name="Sequence" keyType="name-string">
            <Notes>A [i]sequence[/i] is a continuous piece of a [i]contig[/i]. Contigs are split into
            sequences so that we don't have to have the entire contig in memory when we are
            manipulating it. The key of the sequence is the contig ID followed by the index of
            the begin point.</Notes>
            <Fields>
                <Field name="sequence" type="text">
					<Notes>String consisting of the residues. Each residue is described by a single
					character in the string.</Notes>
					<DataGen>RandChars("ACGT", IntGen(100,400))</DataGen>
				</Field>
                <Field name="quality-vector" type="text">
					<Notes>String describing the quality data for each . Individual values will
					be separated by periods. The value represents negative exponent of the probability
					of error. Thus, for example, a quality of 30 indicates the probability of error is
					10^-30. A higher quality number a better chance of a correct match. It is possible
					that the quality data is known for a sequence. If that is the case, the quality
					vector will contain the [b]unknown[/b].</Notes>
					<DataGen>unknown</DataGen>
				</Field>
            </Fields>
        </Entity>
        <Entity name="Feature" keyType="name-string">
            <Notes>A [i]feature[/i] is a part of a genome that is of special interest. Features
            may be spread across multiple contigs of a genome, but never across more than
            one genome. Features can be assigned to roles via spreadsheet cells,
            and are the targets of annotation.</Notes>
            <Fields>
                <Field name="feature-type" type="string">
					<Notes>Code indicating the type of this feature.</Notes>
					<DataGen>RandParam('peg','rna')</DataGen>
				</Field>
                <Field name="alias" type="name-string" relation="FeatureAlias">
					<Notes>Alternative name for this feature. feature can have many aliases.</Notes>
					<DataGen testCount="3">StringGen('Pgi|99999', 'Puni|XXXXXX', 'PAAAAAA999')</DataGen>
				</Field>
                <Field name="translation" type="text" relation="FeatureTranslation">
					<Notes>[i](optional)[/i] A of this feature's residues into character codes, formed by concatenating
	                the pieces of the feature together.</Notes>
					<DataGen testCount="0"></DataGen>
				</Field>
                <Field name="upstream-sequence" type="text" relation="FeatureUpstream">
					<Notes>Upstream sequence the feature. This includes residues preceding the feature as well as some of
					the feature's initial residues.</Notes>
					<DataGen testCount="0"></DataGen>
				</Field>
                <Field name="active" type="boolean">
					<Notes>TRUE if this feature is still considered valid, if it has been logically deleted.</Notes>
					<DataGen>1</DataGen>
				</Field>
				<Field name="link" type="text" relation="FeatureLink">
					<Notes>Web hyperlink for this feature. A feature have no hyperlinks or it can have many. The
					links are to other websites that have useful about the gene that the feature represents, and
					are coded as raw HTML, using [b]&lt;a href="[i]link[/i]"&gt;[i]text[/i]&lt;/a&gt;[/b] notation.</Notes>
					<DataGen testCount="3">'http://www.conservativecat.com/Ferdy/TestTarget.php?Source=' . $this->{id} .
					"&amp;Number=" . IntGen(1,99)</DataGen>
				</Field>
            </Fields>
        </Entity>
        <Entity name="Role" keyType="string">
            <Notes>A [i]role[/i] describes a biological function that may be fulfilled by a feature.
            One of the main goals of the database is to record the roles of the various features.</Notes>
			<Fields>
				<Field name="name" type="string" relation="RoleName">
					<Notes>Expanded name of the role. This value is generally only available for roles
					that are encoded as EC numbers.</Notes>
					<DataGen testCount="1">StringGen(IntGen(20,40)) . "(" . $this->{id} . ")"</DataGen>
				</Field>
			</Fields>
        </Entity>
        <Entity name="Annotation" keyType="name-string">
            <Notes>An [i]annotation[/i] contains supplementary information about a feature. Annotations
			are currently the only objects that may be inserted directly into the database. All other
			information is loaded from data exported by the SEED.
			[p]Each annotation is associated with a target [b]Feature[/b]. The key of the annotation
			is the target feature ID followed by a timestamp.</Notes>
            <Fields>
               	<Field name="time" type="date">
					<Notes>Date and time of the annotation.</Notes>
				</Field>
				<Field name="annotation" type="text">
					<Notes>Text of the annotation.</Notes>
				</Field>
            </Fields>
        </Entity>
        <Entity name="Subsystem" keyType="name-string">
            <Notes>A [i]subsystem[/i] is a collection of roles that work together in a cell. Identification of subsystems
            is an important tool for recognizing parallel genetic features in different organisms.</Notes>
        </Entity>
        <Entity name="SSCell" keyType="name-string">
            <Notes>Part of the process of locating and assigning features is creating a spreadsheet of
            genomes and roles to which features are assigned. A [i]spreadsheet cell[/i] represents one
            of the positions on the spreadsheet.</Notes>
        </Entity>
        <Entity name="SproutUser" keyType="name-string">
            <Notes>A [i]user[/i] is a person who can make annotations and view data in the database. The
            user object is keyed on the user's login name.</Notes>
            <Fields>
				<Field name="description" type="string">
					<Notes>Full name or description of this user.</Notes>
				</Field>
                <Field name="access-code" type="key-string" relation="UserAccess">
					<Notes>Access code possessed by this
                    user. A user can have many access codes; a genome is accessible to the user if its
                    access code matches any one of the user's access codes.</Notes>
					<DataGen testCount="2">RandParam('low', 'medium', 'high')</DataGen>
				</Field>
            </Fields>
        </Entity>
		<Entity name="Property" keyType="int">
			<Notes>A [i]property[/i] is a type of assertion that could be made about the properties of
			a particular feature. Each property instance is a key/value pair and can be associated
			with many different features. Conversely, a feature can be associated with many key/value
			pairs, even some that notionally contradict each other. For example, there can be evidence
			that a feature is essential to the organism's survival and evidence that it is superfluous.</Notes>
			<Fields>
				<Field name="property-name" type="name-string">
					<Notes>Name of this property.</Notes>
				</Field>
				<Field name="property-value" type="string">
					<Notes>Value associated with this property. For each property
					name, there must by a property record for all of its possible
					values.</Notes>
				</Field>
			</Fields>
			<Indexes>
				<Index>
					<Notes>This index enables the application to find all values for a specified property
					name, or any given name/value pair.</Notes>
					<IndexFields>
						<IndexField name="property-name" order="ascending" />
						<IndexField name="property-value" order="ascending" />
					</IndexFields>
				</Index>
			</Indexes>
		</Entity>
		<Entity name="Diagram" keyType="name-string">
			<Notes>A functional diagram describes the chemical reactions, often comprising a single
			subsystem. A diagram is identified by a short name and contains a longer descriptive name.
			The actual diagram shows which functional roles guide the reactions along with the inputs
			and outputs; the database, however, only indicate which roles belong to a particular
			map.</Notes>
			<Fields>
				<Field name="name" type="text">
					<Notes>Descriptive name of this diagram.</Notes>
				</Field>
			</Fields>
		</Entity>
		<Entity name="ExternalAliasOrg" keyType="name-string">
			<Notes>An external alias is a feature name for a functional assignment that is not a
			FIG ID. Functional assignments for external aliases are kept in a separate section of
			the database. This table contains a description of the relevant organism for an
			external alias functional assignment.</Notes>
				<Fields>
					<Field name="org" type="text">
						<Notes>Descriptive name of the target organism for this external alias.</Notes>
					</Field>
				</Fields>
		</Entity>
		<Entity name="ExternalAliasFunc" keyType="name-string">
			<Notes>An external alias is a feature name for a functional assignment that is not a
			FIG ID. Functional assignments for external aliases are kept in a separate section of
			the database. This table contains the functional role for the external alias functional
			assignment.</Notes>
				<Fields>
					<Field name="func" type="text">
						<Notes>Functional role for this external alias.</Notes>
					</Field>
				</Fields>
		</Entity>
    </Entities>
    <Relationships>
        <Relationship name="HasContig" from="Genome" to="Contig" arity="1M">
            <Notes>This relationship connects a genome to the contigs that contain the actual genetic
            information.</Notes>
        </Relationship>
        <Relationship name="ComesFrom" from="Genome" to="Source" arity="MM">
            <Notes>This relationship connects a genome to the sources that mapped it. A genome can
            come from a single source or from a cooperation among multiple sources.</Notes>
        </Relationship>
        <Relationship name="IsMadeUpOf" from="Contig" to="Sequence" arity="1M">
            <Notes>A contig is stored in the database as an ordered set of sequences. By splitting the
            contig into sequences, we get a performance boost from only needing to keep small portions
            of a contig in memory at any one time. This relationship connects the contig to its
            constituent sequences.</Notes>
            <Fields>
                <Field name="len" type="int">
			<Notes>Length of the sequence.</Notes>
		</Field>
                <Field name="start-position" type="int">
			<Notes>Index (1-based) of the point in the contig where this
                	sequence starts.</Notes>
		</Field>
            </Fields>
            <FromIndex>
                <Notes>This index enables the application to find all of the sequences in
               	a contig in order, and makes it easier to find a particular residue section.</Notes>
                <IndexFields>
                    <IndexField name="start-position" order="ascending" />
                    <IndexField name="len" order="ascending" />
                </IndexFields>
            </FromIndex>
        </Relationship>
        <Relationship name="IsTargetOfAnnotation" from="Feature" to="Annotation" arity="1M">
            <Notes>This relationship connects a feature to its annotations.</Notes>
        </Relationship>
        <Relationship name="MadeAnnotation" from="SproutUser" to="Annotation" arity="1M">
            <Notes>This relationship connects an annotation to the user who made it.</Notes>
        </Relationship>
        <Relationship name="ParticipatesIn" from="Genome" to="Subsystem" arity="MM">
            <Notes>This relationship connects subsystems to the genomes that use
            it. If the subsystem has been curated for the genome, then the subsystem's roles will also be
            connected to the genome features through the [b]SSCell[/b] object.</Notes>
        </Relationship>
        <Relationship name="OccursInSubsystem" from="Role" to="Subsystem" arity="MM">
            <Notes>This relationship connects roles to the subsystems that implement them. </Notes>
        </Relationship>
        <Relationship name="IsGenomeOf" from="Genome" to="SSCell" arity="1M">
            <Notes>This relationship connects a subsystem's spreadsheet cell to the
            genome for the spreadsheet column.</Notes>
        </Relationship>
        <Relationship name="IsRoleOf" from="Role" to="SSCell" arity="1M">
            <Notes>This relationship connects a subsystem's spreadsheet cell to the
            role for the spreadsheet row.</Notes>
        </Relationship>
        <Relationship name="ContainsFeature" from="SSCell" to="Feature" arity="MM">
            <Notes>This relationship connects a subsystem's spreadsheet cell to the
            features assigned to it.</Notes>
        </Relationship>
        <Relationship name="IsLocatedIn" from="Feature" to="Contig" arity="MM">
            <Notes>This relationship connects a feature to the contig segments that work together
            to effect it. The segments are numbered sequentially starting from 1. The database is
            required to place an upper limit on the length of each segment. If a segment is longer
            than the maximum, it can be broken into smaller bits.
            [p]The upper limit enables applications to locate all features that contain a specific
            residue. For example, if the upper limit is 100 and we are looking for a feature that
            contains residue 234 of contig [b]ABC[/b], we can look for features with a begin point
            between 135 and 333. The results can then be filtered by direction and length of the
            segment.</Notes>
            <Fields>
                <Field name="locN" type="int">
			<Notes>Sequence number of this segment.</Notes>
		</Field>
                <Field name="beg" type="int">
			<Notes>Index (1-based) of the first residue in the contig that
                	belongs to the segment.</Notes>
		</Field>
                <Field name="len" type="int">
			<Notes>Number of residues in the segment. A length of 0 identifies
                	a specific point between residues. This is the point before the residue if the direction
                	is forward and the point after the residue if the direction is backward.</Notes>
		</Field>
                <Field name="dir" type="char">
			<Notes>Direction of the segment: [b]+[/b] if it is forward and
                	[b]-[/b] if it is backward.</Notes>
		</Field>
            </Fields>
            <FromIndex Unique="false">
                <Notes>This index allows the application to find all the segments of a feature in
               	the proper order.</Notes>
                <IndexFields>
                    <IndexField name="locN" order="ascending" />
                </IndexFields>
            </FromIndex>
            <ToIndex>
                <Notes>This index is the one used by applications to find all the feature
                segments that contain a specific residue.</Notes>
                <IndexFields>
                    <IndexField name="beg" order="ascending" />
                </IndexFields>
            </ToIndex>
        </Relationship>
        <Relationship name="IsClusteredOnChromosomeWith" from="Feature" to="Feature" arity="MM">
            <Notes>This relationship is one of two that relate features to each other. It connects
            features that are physically close to each other on a single chromosome.</Notes>
            <Fields>
                <Field name="score" type="int">
			<Notes>The number of co-occurrences in genomes that are not
                	extremely closely-related.</Notes>
		</Field>
            </Fields>
        </Relationship>
        <Relationship name="IsBidirectionalBestHitOf" from="Feature" to="Feature" arity="MM">
            <Notes>This relationship is one of two that relate features to each other. It
            connects features that are very similar but on separate genomes. A
            bidirectional best hit relationship exists between two features [b]A[/b]
            and [b]B[/b] if [b]A[/b] is the best match for [b]B[/b] on [b]A[/b]'s genome
            and [b]B[/b] is the best match for [b]A[/b] on [b]B[/b]'s genome. </Notes>
            <Fields>
                <Field name="genome" type="name-string">
					<Notes>ID of the genome containing the target (to) feature.</Notes>
				</Field>
				<Field name="sc" type="float">
					<Notes>score for this relationship</Notes>
				</Field>
            </Fields>
            <FromIndex>
                <Notes>This index allows the application to find a feature's best hit for
              	a specific target genome.</Notes>
                <IndexFields>
                    <IndexField name="genome" order="ascending" />
                </IndexFields>
            </FromIndex>
        </Relationship>
		<Relationship name="HasProperty" from="Feature" to="Property" arity="MM">
			<Notes>This relationship connects a feature to its known property values.
			The relationship contains text data that indicates the paper or organization
			that discovered evidence that the feature possesses the property. So, for
			example, if two papers presented evidence that a feature is essential,
			there would be an instance of this relationship for both.</Notes>
			<Fields>
				<Field name="evidence" type="text">
					<Notes>URL or citation of the paper or
					institution that reported evidence of the relevant feature possessing
					the specified property value.</Notes>
				</Field>
			</Fields>
		</Relationship>
		<Relationship name="RoleOccursIn" from="Role" to="Diagram" arity="MM">
			<Notes>This relationship connects a role to the diagrams on which it
			appears. A role frequently identifies an enzyme, and can appear in many
			diagrams. A diagram generally contains many different roles.</Notes>
		</Relationship>
		<Relationship name="HasSSCell" from="Subsystem" to="SSCell" arity="1M">
			<Notes>This relationship connects a subsystem to the spreadsheet cells
			used to analyze and display it. The cells themselves can be thought of
			as a grid with Roles on one axis and Genomes on the other. The
			various features of the subsystem are then assigned to the cells.</Notes>
		</Relationship>
		<Relationship name="IsTrustedBy" from="SproutUser" to="SproutUser" arity="MM">
			<Notes>This relationship identifies the users trusted by each
			particular user. When viewing functional assignments, the
			assignment displayed is the most recent one by a user trusted
			by the current user. The current user implicitly trusts himself.
			If no trusted users are specified in the database, the user
			also implicitly trusts the user [b]FIG[/b].</Notes>
		</Relationship>
    </Relationships>
</Database>

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3