[Bio] / Sprout / SaplingDBD.xml Repository:
ViewVC logotype

View of /Sprout/SaplingDBD.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.2 - (download) (as text) (annotate)
Wed Sep 3 20:57:52 2008 UTC (11 years, 2 months ago) by parrello
Branch: MAIN
CVS Tags: mgrast_rel_2008_0919, mgrast_rel_2008_0917
Changes since 1.1: +511 -454 lines
Still a work in progress.

<Database>
  <Title>Sapling Bioinformatics Database</Title>
  <Notes>The Sapling database is a distributable, self-contained copy of the NMPDR data.
    Unlike Sprout, which is optimized for searching, Sapling is designed to be structurally
    simple without sacrificing the ability to find information quickly.</Notes>
  <Issues>
    <Issue>Must add the new "image" data type to ERDB.</Issue>
    <Issue>Must add the new "dna" data type to ERDB.</Issue>
    <Issue>Must add back the ability to index a secondary relation. Note that
           such indexes can only have a single field.</Issue>
    <Issue>We probably need some type tables that describe things like Identifier(source)
           or Family(kind).</Issue>
    <Issue>The ERDB documentation needs to be updated to include DisplayInfo, Asides,
           the "converse" attribute for relationships, and the Shapes section.</Issue>
    <Issue>Similarities and pairings are not hooked in correctly.</Issue>
  </Issues>
  <Entities>
    <Entity name="Compound" keyType="name-string">
      <DisplayInfo theme="web" col="3" row="1"/>
      <Notes>A compound is a chemical that participates in a reaction.
             All compounds have a unique ID and may also have one or more names. Both
             ligands and reaction components are treated as compounds.</Notes>
      <Fields>
        <Field name="label" type="string">
          <Notes>Primary name of the compound. This is the name used in reaction
                 display strings.</Notes>
        </Field>
        <Field name="name" type="string" relation="CompoundName">
          <Notes>Alternate name for the compound. A compound may have many
                 alternate names. The primary name should also be one of the
                 alternate names.</Notes>
        </Field>
        <Field name="cas-id" type="string" relation="CompoundCAS">
          <Notes>The Chemical Abstract Service ID for the compound. A
                 compound may have at most one CAS ID.</Notes>
        </Field>
        <Field name="zinc-id" type="string" relation="CompoundZinc">
          <Notes>The ZINC database ID for the compound. A compound may
                 have at most one ZINC ID.</Notes>
        </Field>
      </Fields>
      <Indexes>
        <Index>
          <Notes>This index allows searching for compounds by name.</Notes>
          <IndexFields>
            <IndexField name="name" order="ascending"/>
          </IndexFields>
        </Index>
        <Index>
          <Notes>This index allows searching for compounds by CAS ID.</Notes>
          <IndexFields>
            <IndexField name="cas-id" order="ascending"/>
          </IndexFields>
        </Index>
        <Index>
          <Notes>This index allows searching for compounds by ZINC ID.</Notes>
          <IndexFields>
            <IndexField name="zinc-id" order="ascending"/>
          </IndexFields>
        </Index>
      </Indexes>
    </Entity>
    <Entity name="Diagram" keyType="name-string">
      <DisplayInfo theme="web" col="5" row="3"/>
      <Notes>A functional diagram describes a network of chemical reactions, often comprising a single
             subsystem. A diagram is identified by a short name and contains a longer descriptive name.</Notes>
      <Fields>
        <Field name="name" type="text">
          <Notes>Descriptive name of this diagram.</Notes>
        </Field>
        <Field name="content" type="image" relation="DiagramContent">
          <Notes>The content of the diagram, in PNG format encoded as base 64 MIME.</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="Reaction" keyType="key-string">
      <DisplayInfo theme="web" col="3" row="3"/>
      <Notes>A reaction is a chemical process that converts one set of compounds (substrate)
             to another set (products). The reaction ID is generally a small number preceded by a
             letter.</Notes>
      <Fields>
        <Field name="url" type="string" relation="ReactionURL">
          <Notes>HTML string containing a link to a web location that describes the
                 reaction. This field is optional.</Notes>
        </Field>
        <Field name="rev" type="boolean">
          <Notes>TRUE if this reaction is reversible, else FALSE</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="Subsystem" keyType="id-string">
      <DisplayInfo theme="seed" col="7" row="3"/>
      <Notes>A subsystem is a collection of roles that work together in a cell. Identification of subsystems
             is an important tool for recognizing parallel genetic features in different organisms. The key
             is an alphanumeric code string.</Notes>
      <Fields>
        <Field name="name" type="string">
          <Notes>Displayable name of this subsystem.</Notes>
        </Field>
        <Field name="version" type="int">
          <Notes>Version number for the subsystem. This value is incremented each time the subsystem
                 is backed up.</Notes>
        </Field>
        <Field name="curator" type="string">
          <Notes>Name of the person currently in charge of the subsystem.</Notes>
        </Field>
        <Field name="notes" type="text">
          <Notes>Descriptive notes about the subsystem.</Notes>
        </Field>
        <Field name="description" type="text">
          <Notes>Description of the subsystem's function in the cell.</Notes>
        </Field>
        <Field name="classification" type="string">
          <Notes>Classification string, colon-delimited. This string organizes the
                 subsystems into a hierarchy.</Notes>
        </Field>
      </Fields>
      <Indexes>
        <Index>
          <Notes>This index is used to get the subsystems in hierarchical order.</Notes>
          <IndexFields>
            <IndexField name="classification" order="ascending"/>
          </IndexFields>
        </Index>
        <Index>
          <Notes>This index is used to get the subsystem by name.</Notes>
          <IndexFields>
            <IndexField name="name" order="ascending"/>
          </IndexFields>
        </Index>
      </Indexes>
    </Entity>
    <Entity name="Publication" keyType="hash-string">
      <DisplayInfo theme="web" col="1" row="8"/>
      <Notes>A _publication_ is an article or citation that may be used as evidence for
             assertions made in the database. The key is a hash code computed from the URL.</Notes>
      <Fields>
        <Field name="url" type="string">
          <Notes>URL of the article or of its citation.</Notes>
        </Field>
        <Field name="citation" type="text">
          <Notes>Citation string for the article.</Notes>
        </Field>
      </Fields>
      <Indexes>
        <Index>
          <Notes>This index allows searching for the article by the author names and title.</Notes>
          <IndexFields>
            <IndexField name="citation" order="ascending"/>
          </IndexFields>
        </Index>
      </Indexes>
    </Entity>
    <Entity name="Variant" keyType="name-string">
      <DisplayInfo theme="seed" col="7" row="5"/>
      <Notes>A variant is a functional subset of a subsystem. It indicates the particular
             sequence of roles used to implement a metabolic pathway. Variants are abstract
             concepts used to classify machines. The key of the variant is the subsystem ID followed
             by the variant code (usually a numeric string with zero or more decimal points).</Notes>
      <Fields>
        <Field name="role-rule" type="text">
          <Notes>Boolean expression (encoded as text) that describes the roles in this variant.
          The roles themselves are represented by their IDs.</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="ProteinSequence" keyType="hash-string">
      <DisplayInfo theme="web" col="3" row="7" caption="Protein Sequence"/>
      <Notes>A protein sequence is a specific sequence of amino acids. Unlike a DNA sequence, a
             protein sequence does not belong to a genome. Identical proteins generated by different
             genomes are generally stored as a single ProteinSequence instance. The key is a
             hash of the protein letter sequence.</Notes>
      <Fields>
        <Field name="sequence" type="dna">
          <Notes>The sequence contains the letters corresponding to the protein's
                 amino acids.</Notes>
        </Field>
        <Field name="iedb" type="text" relation="ProteinSequenceIEDB" special="property_search">
          <Notes>A value indicating whether or not the feature can be found in the
                 Immune Epitope Database. If the feature has not been matched to that database,
                 this field will have no values. Otherwise, it will have an epitope name and/or
                 sequence, hyperlinked to the database.</Notes>
        </Field>
        <Field name="signal-peptide" type="name-string">
          <Notes>The signal peptide location for this feature. This is expressed as start and end
                 numbers with a hyphen for the relevant amino acids. So, "1-22" would indicate a signal
                 peptide at the beginning of the feature's protein and extending through 22 amino acid
                 positions. An empty string means no signal peptide is present.</Notes>
        </Field>
        <Field name="transmembrane-map" type="text">
          <Notes>A map indicating which sections of a protein will be embedded in a membrane.
                 This is expressed as a comma-separated list of as start and end numbers with hyphens
                 for the relevant amino acids. So, "10-12, 40-60" would indicate that there are two
                 sections of the protein that become embedded in a membrane: the 10th through 12th
                 amino acids, and the 40th through the 60th. An empty string means no
                 transmembrane regions are known.</Notes>
        </Field>
        <Field name="similar-to-human" type="boolean">
          <Notes>TRUE if this feature generates a protein that is similar to one found in humans,
                 else FALSE</Notes>
        </Field>
        <Field name="isoelectric-point" type="float">
          <Notes>pH in the surrounding medium at which the charge on a protein is neutral.
                 If the pH of the medium is lower than this value, the protein will have a net
                 positive charge. If the pH of the medium is higher, then the protein will have a
                 net negative charge.</Notes>
        </Field>
        <Field name="molecular-weight" type="float">
          <Notes>Molecular weight of this feature's protein, in daltons. A weight of 0
                 indicates that no protein is created.</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="Family" keyType="name-string">
      <DisplayInfo theme="seed" col="4" row="11"/>
      <Notes>A family is a group of features united by a particular determination algorithm.
             The algorithm will frequently-- but not always-- signify a functional role.</Notes>
    </Entity>
    <Entity name="MolecularMachine" keyType="key-string">
      <DisplayInfo theme="seed" col="7" row="7" caption="Molecular\nMachine"/>
      <Notes>A molecular machine is a collection of features that implements a metabolic pathway. Machines
             are the physical instances of variants. Each machine corresponds to a row in a subsystem
             spreadsheet. The key is the variant key followed by a colon and the Genome ID.</Notes>
      <Fields>
        <Field name="type" type="key-string">
          <Notes>The machine type indicates how it relates to the parent variant. A type
                 of "vacant" means that the machine does not appear to actually exist in the
                 organism. A type of "incomplete" means that the machine appears to be missing
                 many reactions. In all other cases, the type is "normal".</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="Scenario" keyType="string">
      <DisplayInfo theme="web" col="5" row="1"/>
      <Notes>A scenario is a partial instance of a subsystem with a defined set of
  reactions.Each scenario converrts input compounds to output compounds using reactions.
             The scenario may use all of the reactions controlled by a subsystem or only
             some, and may also incorporate additional reactions.</Notes>
    </Entity>
    <Entity name="Pairing" keyType="name-string">
      <DisplayInfo theme="seed" col="5" row="11"/>
      <Notes>A pairing indicates that two protein sequences are found close together on one or
             more DNA sequences. Not all possible pairings are stored in the database; only those that
             are considered for some reason to be significant for annotation purposes.The key of the pairing is the
             concatenation of the protein sequence keys in alphabetical order.</Notes>
      <Asides>Because the protein sequence key is a hash of the sequence letters, the key of a pairing between two
              sequences is computable from the sequences themselves. Theoretically, the pairing
              is unordered: (A,B) and (B,A) are the same pairing. It is frequently the case,
              however, that we need to refer to the "first" or "second" protein in the pairing.
              When this happens, the first one is always the protein with the alphabetically
              lesser key. The IsInPair relationship automatically shows the proteins in this
              order.</Asides>
    </Entity>
    <Entity name="Genome" keyType="name-string">
      <DisplayInfo theme="nmpdr" col="7" row="9" caption="Genome Organism"/>
      <Notes>A genome represents a specific organism with DNA, or a specific meta-genome. All DNA
sequences in the database belong to genomes.</Notes>
      <Fields>
        <Field name="full-name" type="name-string">
          <Notes>Full genus/species/strain name of the genome.</Notes>
        </Field>
        <Field name="domain" type="name-string">
          <Notes>Domain for this genome or taxonomic classification. The domain is
                 the highest level of the taxonomy tree.</Notes>
        </Field>
        <Field name="version" type="name-string">
          <Notes>Version string for this genome, generally consisting of the genome ID followed
                 by a period and a string of digits.</Notes>
        </Field>
        <Field name="complete" type="boolean">
          <Notes>TRUE if the genome is complete, else FALSE</Notes>
        </Field>
        <Field name="dna-size" type="counter">
          <Notes>number of base pairs in the genome</Notes>
        </Field>
        <Field name="primary-group" type="name-string">
          <Notes>The primary NMPDR group for this organism. There is always exactly one NMPDR
                 group per organism. An empty string indicates the organism is supporting. In general,
                 more data is kept on organisms in NMPDR groups than on supporting organisms.</Notes>
        </Field>
        <Field name="contigs" type="int">
          <Notes>Number of contigs for this organism.</Notes>
        </Field>
        <Field name="pegs" type="int">
          <Notes>Number of protein encoding genes for this organism</Notes>
        </Field>
        <Field name="rnas" type="int">
          <Notes>Number of RNA features found for this organism.</Notes>
        </Field>
      </Fields>
      <Indexes>
        <Index>
          <Notes>This index allows the applications to find all genomes associated with
                 a specific primary (NMPDR) group.</Notes>
          <IndexFields>
            <IndexField name="primary-group" order="ascending"/>
            <IndexField name="full-name" order="ascending"/>
          </IndexFields>
        </Index>
        <Index>
          <Notes>This index allows the applications to find all genomes in lexical
                 order by name.</Notes>
          <IndexFields>
            <IndexField name="full-name" order="ascending"/>
          </IndexFields>
        </Index>
      </Indexes>
    </Entity>
    <Entity name="Feature" keyType="id-string">
      <DisplayInfo theme="seed" col="5" row="9"/>
      <Notes>A feature (sometimes also called a gene) is a part of a genome that is of special
             interest. Features may be spread across multiple DNA sequences (contigs) of a genome, but
             never across more than one genome. Each feature in the database has a unique FIG ID.</Notes>
      <Fields>
        <Field name="feature-type" type="id-string">
          <Notes>Code indicating the type of this feature. Among the codes currently
                 supported are "peg" for a protein encoding gene, "bs" for a
                 binding site, "opr" for an operon, and so forth.</Notes>
        </Field>
        <Field name="link" type="text" relation="FeatureLink">
          <Notes>Web hyperlink for this feature. A feature can have no hyperlinks or it can have many. The
                 links are to other websites that have useful about the gene that the feature represents, and
                 are coded as raw HTML, using an anchor href tag.</Notes>
        </Field>
        <Field name="essential" type="text" relation="FeatureEssential" special="property_search">
          <Notes>A value indicating the essentiality of the feature, coded as HTML. In most
                 cases, this will be a word describing whether the essentiality is confirmed (essential)
                 or potential (potential-essential), hyperlinked to the document from which the
                 essentiality was curated. If a feature is not essential, this field will have no
                 values; otherwise, it may have multiple values.</Notes>
        </Field>
        <Field name="virulent" type="text" relation="FeatureVirulent" special="property_search">
          <Notes>A value indicating the virulence of the feature, coded as HTML. In most
                 cases, this will be a phrase or SA number hyperlinked to the document from which
                 the virulence information was curated. If the feature is not virulent, this field
                 will have no values; otherwise, it may have multiple values.</Notes>
        </Field>
        <Field name="sequence-length" type="counter">
          <Notes>Number of base pairs in this feature.</Notes>
        </Field>
        <Field name="evidence-code" type="string" relation="FeatureEvidence">
          <Notes>An evidence code describes the possible evidence that exists
      for deciding a feature's functional assignment. A feature may have no evidence,
      a single evidence code, or several.</Notes>
        </Field>
        <Field name="function" type="text">
          <Notes>Functional assignment for this feature. This will often indicate
the feature's functional role or roles, and may also have comments.</Notes>
          <Asides>It will frequently be the case that a feature is assigned to a single
role, and it is identical to the function. In some cases, a feature will have
multiple roles, and all of them will be listed in the function field. In addition,
the function may have comment text at the end.</Asides>
        </Field>
      </Fields>
    </Entity>
    <Entity name="Annotation" keyType="string">
      <DisplayInfo col="3" row="11" theme="seed"/>
      <Notes>An annotation is a comment attached to a feature. Annotations are used to
track the history of a feature's functional assignments and any related issues. The
key is the feature ID followed by a colon and an complemented eight-digit sequence number.</Notes>
      <Asides>The complemented sequence number causes the annotations to sort with the most recent one
first.</Asides>
      <Fields>
        <Field name="annotator" type="string">
          <Notes>Name of the annotator who made the comment.</Notes>
        </Field>
        <Field name="comment" type="text">
          <Notes>Text of the annotation.</Notes>
        </Field>
        <Field name="annotation-time" type="date">
          <Notes>Date and time at which the annotation was made.</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="Role" keyType="hash-string">
      <DisplayInfo theme="web" col="5" row="5"/>
      <Notes>A role describes a biological function that may be fulfilled by a feature.
             One of the main goals of the database is to assign features to roles. Most
             roles are effected by the construction of proteins. Some, however, deal with
             functional regulation and message transmission.</Notes>
      <Asides>A role represents a single gene function. Many roles are in
subsystems, but some are not. If a feature has multiple functions, each
is represented as a separate role.</Asides>
      <Fields>
        <Field name="hypothetical" type="boolean">
          <Notes>TRUE if a role is hypothetical, else FALSE</Notes>
        </Field>
        <Field name="name" type="string">
          <Notes>English name of this role. The actual role ID is computed from this field.</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="RoleSet" keyType="int">
      <DisplayInfo theme="web" col="3" row="5" caption="Role Set"/>
      <Notes>A role set is a group of roles that work together to stimulate a reaction. Most role sets consist of a single
role; however, some reactions require the presence of multiple roles to get them started.</Notes>
      <Asides>A reaction is usually triggered by a single role, but some reactions are triggered
by a boolean combination of roles (e.g. =(A and (B or C) and D) or (E and B and F) or G=). The boolean
expression can be converted into disjunctive normal form, which is a list of alternative sets
 (e.g. =(A and B and D) or (A and C and D) or (E and B and F) or G=). Each alternative is then converted
into a role set. This allows us to precisely represent the triggering conditions of a reaction in the database.</Asides>
    </Entity>
    <Entity name="DnaSequence" keyType="name-string">
      <DisplayInfo theme="nmpdr" col="7" row="11" caption="DNA Sequence"/>
      <Notes>A DNA sequence (sometimes called a "contig") is a contiguous sequence of base pairs
             belonging to a single genome. The key of the DNA sequence is the genome ID followed by
             the contig ID.</Notes>
      <Fields>
        <Field name="length" type="counter">
          <Notes>Number of base pairs in the DNA sequence.</Notes>
        </Field>
        <Field name="bases" type="text" relation="DnaSequenceBases">
          <Notes>A string of letters representing the nucleotides of the sequence.</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="TaxonomicGrouping" keyType="string">
      <DisplayInfo row="10" col="8" caption="Taxonomic\nGrouping" theme="nmpdr"/>
      <Notes>A taxonomic grouping is a segment of the classification for an organism.
  Taxonomic groupings are organized into a strict hierarchy by the IsClassOf
relationship.</Notes>
      <Fields>
        <Field name="level" type="int">
          <Notes>Taxonomic classification level. A level of 0 indicates that this is
                    a specific strain with DNA attached. Higher levels indicate progressively
                    larger classifications. Each level number represents a specific type of
                    classification. Sub-species is always 1, species is always 2, genus is always
                    3, and so forth, up to 99 for domain. This means that as you travel up the
                    taxonomy tree, the ranks will be non-sequential.</Notes>
        </Field>
      </Fields>
      <Indexes>
        <Index>
          <Notes>This index allows the applications to find all groupings by level.
                 lower (less inclusive) levels will occur first.</Notes>
          <IndexFields>
            <IndexField name="level" order="ascending"/>
          </IndexFields>
        </Index>
      </Indexes>
    </Entity>
    <Entity name="Structure" keyType="name-string">
      <DisplayInfo theme="web" col="2" row="5"/>
      <Notes>A structure is the geometrical representation of a protein sequence. A single protein sequence may
  have multiple structural representations, either because it is folded in different ways or because there
  are alternative representation formats. The key field is the representation type (e.g. PDB, SCOPE)
  followed by the ID, with an intervening vertical bar.</Notes>
    </Entity>
    <Entity name="FcEvidenceSet" keyType="int">
      <DisplayInfo theme="seed" col="5" row="13" caption="Functional Coupling Evidence Set"/>
      <Notes>A functional coupling evidence set indicates evidence for a functional connection between protein
             sequence pairs. The protein sequences possessing the connection are the ones that
             participate in the evidence set's pairings.</Notes>
      <Asides>The pairings for a particular evidence set
              will contain protein sequences that are significantly similar. In other words, if
              (A,B) and (X,Y) are both pairings in a single evidence set, then (A =~ X) and
              (B =~ Y) or (A =~ Y) and (B =~ X), depending on the value of the "inverted" attribute of 
the IsDeterminedBy relationship. Essentially, a pairing in its own right is unordered.
If (A,B) is a pair, then so is (B,A). However, the evidence set maintains a correspondence
between its pairs that _is_ ordered, because the constituent pairs must match. The
direction in which a pair matches others in the set is an attribute of the relationship from the pairs
to the sets.</Asides>
      <Fields>
        <Field name="score" type="int">
          <Notes>Score for this evidence set. The score indicates the number of
                 significantly different genomes represented by the pairings.</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="MachineRole" keyType="name-string">
      <DisplayInfo row="7" col="5" caption="Machine Role" theme="seed"/>
      <Notes>A machine role represents a role as it occurs in a molecular machine. The key
      is the machine key plus the role abbreviation.</Notes>
      <Asides>The machine role corresponds to a cell on the subsystem spreadsheet. Features
      in the subsystem are assigned directly to the machine role.</Asides>
    </Entity>
    <Entity name="IdentifierSet" keyType="name-string">
      <DisplayInfo row="9" col="1" theme="seed"/>
      <Notes>The identifier set is a group of identifiers that mean the same thing, usually either a Feature 
  or a Protein Sequence. The identifiers in a set will frequently belong to different genomic databases.
  Thus, if a specific protein sequence has one name in the NMPDR and another name in RefSeq, both of
  the names would be in the same identifier set.</Notes>
    </Entity>
    <Entity name="Identifier" keyType="string">
      <DisplayInfo theme="seed" col="3" row="9"/>
      <Notes>An identifier is an alternate name for a feature or protein sequence.</Notes>
      <Asides>Some identifiers name features or protein sequences that do not exist in the database. In this case,
  the feature or protein sequence is considered _external_; that is, it belongs to another database.</Asides>
      <Fields>
        <Field name="source" type="key-string">
          <Notes>Specific type of the identifier, such as its source database or category.
                 The type can usually be decoded to convert the identifier to a URL.</Notes>
        </Field>
      </Fields>
      <Indexes>
        <Index>
          <Notes>This index allows all the identifiers of a specified type to be located.</Notes>
          <IndexFields>
            <IndexField name="source" order="ascending"/>
          </IndexFields>
        </Index>
      </Indexes>
    </Entity>
  </Entities>
  <Relationships>
    <Relationship name="IsTerminusFor" from="Compound" to="Scenario" arity="MM" converse="HasAsTerminus">
      <DisplayInfo caption="Has As\nTerminus"/>
      <Notes>A terminus for a scenario is a compound that acts as its input or output. A compound
             can be the terminus for many scenarios, and a scenario will have many termini. The relationship
             attributes indicate whether the compound is an input to the scenario or an output. In some
             cases, there may be multiple alternative output groups. This is also indicated by the
             attributes.</Notes>
      <Fields>
        <Field name="group-number" type="int">
          <Notes>If zero, then the compound is an input. Otherwise, this is the index number
                 of the output group. Each output group represents an alternative set of output
                 compounds.</Notes>
        </Field>
      </Fields>
      <ToIndex>
        <Notes>This index allows the application to view a scenario's compounds by group.</Notes>
        <IndexFields>
          <IndexField name="group-number" type="int"/>
        </IndexFields>
      </ToIndex>
    </Relationship>
    <Relationship name="IsRelevantFor" from="Diagram" to="Subsystem" arity="MM" converse="IsRelevantTo">
      <DisplayInfo theme="seed" caption="Is\nRelevant\nFor"/>
      <Notes>Thie relationship connects each subsystem to the diagrams that are useful in curating
      and understanding the subsystem. A subsystem may overlap many diagrams, but only those considered
      crucial are connected via this relationship. The relationship is many-to-many.</Notes>
    </Relationship>
    <Relationship name="Describes" from="Subsystem" to="Variant" arity="1M" converse="IsDescribedBy">
      <DisplayInfo theme="seed"/>
      <Notes>This relationship connects a subsystem to the individual variants used
            to implement it. Each variant contains a slightly different subset of the
            roles in the parent subsystem.</Notes>
    </Relationship>
    <Relationship name="Shows" from="Diagram" to="Reaction" arity="MM" converse="IsShowedOn">
      <DisplayInfo theme="web"/>
      <Notes>This relationship connects a diagram to its reactions. A diagram shows multiple
            reactions, and a reaction can be on many diagrams.</Notes>
    </Relationship>
    <Relationship name="IsOwnerOf" from="Genome" to="Feature" arity="1M" converse="IsOwnedBy">
      <DisplayInfo caption="Is\nOwned\nBy" theme="seed"/>
      <Notes>This relationship connects each feature to its parent genome.</Notes>
    </Relationship>
    <Relationship name="IsImplementedBy" from="Variant" to="MolecularMachine" arity="1M" converse="Implements">
      <DisplayInfo theme="seed" caption="Is\nImplemented\nBy" row="6" col="7"/>
      <Notes>This relationship connects a variant to the physical machines that implement
            it in the genomes. A variant is implemented by many machines, but a machine belongs to
            only one variant.</Notes>
    </Relationship>
    <Relationship name="Uses" theme="seed" from="Genome" to="MolecularMachine" arity="1M" converse="IsUsedBy">
      <DisplayInfo theme="seed" caption="Is\nUsed\nBy"/>
      <Notes>This relationship connects a genome to the machines that form its
            metabolic pathways. A genome can use many machines, but a machine is used by exactly
            one genome.</Notes>
    </Relationship>
    <Relationship name="Includes" from="Subsystem" to="Role" arity="MM" converse="IsIncludedIn">
      <DisplayInfo theme="seed" caption="Includes"/>
      <Notes>A subsystem is defined by its roles. The subsystem's variants contain slightly
            different sets of roles, but all of the roles in a variant must be connected to the
            parent subsystem by this relationship. A subsystem always has at least one
            role, and a role always belongs to at least one subsystem.</Notes>
      <Fields>
        <Field name="sequence" type="counter">
          <Notes>Sequence number of the role within the subsystem. When the roles
                 are formed into a variant, they will generally appear in sequence order.</Notes>
        </Field>
        <Field name="abbreviation" type="key-string">
          <Notes>Abbreviation for this role in this subsystem. The abbreviations are
used in columnar displays, and they also appear on diagrams.</Notes>
        </Field>
      </Fields>
      <FromIndex>
        <Notes>This index insures that the roles of the subsystem are presented in sequence
                order.</Notes>
        <IndexFields>
          <IndexField name="sequence" order="ascending"/>
        </IndexFields>
      </FromIndex>
    </Relationship>
    <Relationship name="Implements" from="ProteinSequence" to="Role" arity="MM" converse="IsCatalyzedBy">
      <DisplayInfo theme="web" caption="Is\nImplemented\nBy"/>
      <Notes>This relationship connects a protein sequence to the functional roles it
            implements in the cell. A protein sequence can implement many roles, and a role can
            be implemented by many protein sequences. Roles that perform regulatory or message
            transmission functions do not participate in this relationship.</Notes>
    </Relationship>
    <Relationship name="IsCombinationOf" from="RoleSet" to="Role" arity="MM" converse="IsInCombination">
      <DisplayInfo theme="web" caption="Is\nCombination\nOf"/>
      <Notes>This relationship combines roles into role sets. Each role set is a combination of roles that can
trigger a reaction.</Notes>
    </Relationship>
    <Relationship name="IsTriggeredBy" from="Reaction" to="RoleSet" arity="MM" converse="Triggers">
      <DisplayInfo theme="web" caption="Is\nTriggered\nBy"/>
      <Notes>A reaction can be triggered by many role sets. A role set can trigger many reactions.</Notes>
    </Relationship>
    <Relationship name="IsClassOf" from="TaxonomicGrouping" to="TaxonomicGrouping" arity="1M" converse="IsClassifiedAs">
      <DisplayInfo theme="nmpdr" col="8" row="11" fixed="1" caption="Is\nClass\nOf"/>
      <Notes>The recursive IsClassOf relationship organizes taxonomic groupings into a hierarchy
            based on the standard organism taxonomy.</Notes>
    </Relationship>
    <Relationship name="IsFoundOn" from="Role" to="Diagram" arity="MM" converse="IsLocationOf">
      <DisplayInfo theme="web" caption="Is\nLocation\nOf"/>
      <Notes>This relationship connects a role to the diagrams on which it appears. A diagram
      always contains many roles. A role may appear on multiple diagrams.</Notes>
    </Relationship>
    <Relationship name="IsLocatedIn" from="Feature" to="DnaSequence" arity="MM" converse="IsLocusFor">
      <DisplayInfo theme="seed" caption="Is\nLocated\nIn" fixed="1" row="10" col="6"/>
      <Notes>A feature is a set of DNA sequence fragments. Most features are a single contiquous
            fragment, so they are located in only one DNA sequence; however, fragments have a maximum
            length, so even a single contiguous feature may participate in this relationship multiple
            times. A few features belong to multiple DNA sequences. In that case, however, all the
            DNA sequences belong to the same genome. A DNA sequence itself will frequently have
            thousands of features connected to it.</Notes>
      <Fields>
        <Field name="locN" type="int">
          <Notes>Sequence number of this segment.</Notes>
        </Field>
        <Field name="beg" type="int">
          <Notes>Index (1-based) of the first residue in the contig that
                    belongs to the segment.</Notes>
        </Field>
        <Field name="len" type="int">
          <Notes>Number of residues in the segment. A length of 0 identifies
                    a specific point between residues. This is the point before the residue if the direction
                    is forward and the point after the residue if the direction is backward.</Notes>
        </Field>
        <Field name="dir" type="char">
          <Notes>Direction of the segment: "+" if it is forward and
                    "-" if it is backward.</Notes>
        </Field>
      </Fields>
      <FromIndex>
        <Notes>This index allows the application to find all the segments of a feature in
                the proper order.</Notes>
        <IndexFields>
          <IndexField name="locN" order="ascending"/>
        </IndexFields>
      </FromIndex>
      <ToIndex>
        <Notes>This index is the one used by applications to find all the feature
                segments that contain a specific residue.</Notes>
        <IndexFields>
          <IndexField name="beg" order="ascending"/>
        </IndexFields>
      </ToIndex>
    </Relationship>
    <Relationship name="IsDeterminedBy" from="FcEvidenceSet" to="Pairing" arity="MM" converse="Determines">
      <DisplayInfo theme="seed" caption="Determines"/>
      <Notes>A functional coupling evidence set exists because it has pairings in it, and this relationship
             connects the evidence set to its constituent pairings. A pairing cam belong to
             multiple evidence sets.</Notes>
      <Fields>
        <Field name="inverted" type="boolean">
          <Notes>A pairing is an unordered pair of protein sequences, but its
                 similarity to other pairings in an evidence set is ordered. Let (A,B) be
                 a pairing and (X,Y) be another pairing in the same set. If this flag is
                 FALSE, then (A =~ X) and (B =~ Y). If this flag is TRUE, then (A =~ Y) and
                 (B =~ X).</Notes>
        </Field>
      </Fields>
    </Relationship>
    <Relationship name="IsFunctionOf" from="Role" to="Feature" arity="MM" converse="Targets">
      <DisplayInfo theme="seed" fixed="1" row="7" col="4" caption="Is\nFunction\nOf"/>
      <Notes>This relationship connects a role to the features that facilitate the role.
A role can be the function of multiple features, and a single feature may have
multiple roles.</Notes>
    </Relationship>
    <Relationship name="IsMadeUpOf" from="Genome" to="DnaSequence" arity="1M" converse="MakesUp">
      <DisplayInfo theme="nmpdr" caption="Is\nMade Up\nOf"/>
      <Notes>This relationship connects each genome to the DNA sequences that make it up.</Notes>
    </Relationship>
    <Relationship name="IsAnnotatedBy" from="Feature" to="Annotation" arity="1M" converse="Annotates">
      <DisplayInfo theme="seed" caption="Is\nAnnotated\nBy" fixed="1" col="3" row="10"/>
      <Notes>This relationship connects a feature to its annotations. A feature may have
multiple annotations, but an annotation belongs to only one feature.</Notes>
    </Relationship>
    <Relationship name="HasMember" from="Family" to="Feature" arity="1M" converse="IsMemberOf">
      <DisplayInfo theme="seed" caption="Is\nMember\nOf" row="10" col="4" fixed="1"/>
      <Notes>This relationship connects each feature family to its constituent
             features. A family always has many features, but a single feature can
             be found in at most one family.</Notes>
    </Relationship>
    <Relationship name="Attracts" from="Structure" to="Compound" arity="MM" converse="IsAttractedTo">
      <DisplayInfo theme="web" row="1" col="2" fixed="1" caption="Is\nAttracted\nTo"/>
      <Notes>This relationship connects a compound to the protein structures that attract it.
            This is an incomplete relationship that exists to service drug targeting queries. Only
            the attractions whose parameters have been determined through modeling or
            experimentation are included. The goal is to determine the docking energy between
            the compound and the protein structure.</Notes>
      <Fields>
        <Field name="reason" type="id-string">
          <Notes>Indication of the reason for determining the docking energy.
                    A value of "Random" indicates the docking was attempted as a part
                    of a random survey used to determine the docking characteristics of a
                    protein structure. A value of "Rich" indicates the docking was attempted
                    because a low-energy docking result was predicted for the compound.</Notes>
        </Field>
        <Field name="tool" type="id-string">
          <Notes>Name of the tool used to compute the docking energy.</Notes>
        </Field>
        <Field name="total-energy" type="float">
          <Notes>Total energy required for the compound to dock with the structure,
                    in kcal/mol. A negative value means energy is released.</Notes>
        </Field>
        <Field name="vanderwalls-energy" type="float">
          <Notes>Docking energy in kcal/mol that results from the geometric fit
                    (Van der Waals force) between the structure and the compound.</Notes>
        </Field>
        <Field name="electrostatic-energy" type="float">
          <Notes>Docking energy in kcal/mol that results from the movement of
                    electrons (electrostatic force) between the structure and the
                    compound.</Notes>
        </Field>
      </Fields>
      <FromIndex>
        <Notes>This index enables the application to view a structure's docking results from
                the lowest energy (best docking) to highest energy (worst docking).</Notes>
        <IndexFields>
          <IndexField name="total-energy" order="ascending"/>
        </IndexFields>
      </FromIndex>
      <ToIndex>
        <Notes>This index enables the application to view a compound's docking results from
                the lowest energy (best docking) to highest energy (worst docking).</Notes>
        <IndexFields>
          <IndexField name="total-energy" order="ascending"/>
        </IndexFields>
      </ToIndex>
    </Relationship>
    <Relationship name="Involves" from="Reaction" to="Compound" arity="MM" converse="IsInvolvedIn">
      <DisplayInfo theme="web" caption="Is\nInvolved\nIn" fixed="1" row="2" col="2.5"/>
      <Notes>This relationship connects a reaction to the compounds that participate in
            it. A reaction involves many compounds, and a compound can be involved in many reactions.
            The relationship attributes indicate whether a compound is a product or substrate of the
            reaction, as well as its stoichiometry.</Notes>
      <Fields>
        <Field name="product" type="boolean">
          <Notes>TRUE if the compound is a product of the reaction, FALSE if
                    it is a substrate. When a reaction is written on paper in
                    chemical notation, the substrates are left of the arrow and the
                    products are to the right. Sorting on this field will cause
                    the substrates to appear first, followed by the products. If the
                    reaction is reversible, then the notion of substrates and products
                    is not intuitive; however, a value here of FALSE still puts the
                    compound left of the arrow and a value of TRUE still puts it to the
                    right.</Notes>
        </Field>
        <Field name="stoichiometry" type="key-string">
          <Notes>Number of molecules of the compound that participate in a
                    single instance of the reaction. For example, if a reaction
                    produces two water molecules, the stoichiometry of water for the
                    reaction would be two. When a reaction is written on paper in
                    chemical notation, the stoichiometry is the number next to the
                    chemical formula of the compound.</Notes>
        </Field>
        <Field name="main" type="boolean">
          <Notes>TRUE if this compound is one of the main participants in
                    the reaction, else FALSE. It is permissible for none of the
                    compounds in the reaction to be considered main, in which
                    case this value would be FALSE for all of the relevant
                    compounds.</Notes>
        </Field>
        <Field name="loc" type="key-string">
          <Notes>An optional character string that indicates the relative
                    position of this compound in the reaction's chemical formula. The
                    location affects the way the compounds present as we cross the
                    relationship from the reaction side. The product/substrate flag
                    comes first, then the value of this field, then the main flag.
                    The default value is an empty string; however, the empty string
                    sorts first, so if this field is used, it should probably be
                    used for every compound in the reaction.</Notes>
        </Field>
        <Field name="discriminator" type="int">
          <Notes>A unique ID for this record. The discriminator does not
                    provide any useful data, but it prevents identical records from
                    being collapsed by the SELECT DISTINCT command used by ERDB to
                    retrieve data.</Notes>
        </Field>
      </Fields>
      <ToIndex>
        <Notes>This index presents the compounds in the reaction in the
                order they should be displayed when writing it in chemical notation.
                All the substrates appear before all the products, and within that
                ordering, the main compounds appear first.</Notes>
        <IndexFields>
          <IndexField name="product" order="ascending"/>
          <IndexField name="loc" order="ascending"/>
          <IndexField name="main" order="descending"/>
        </IndexFields>
      </ToIndex>
    </Relationship>
    <Relationship name="Contains" from="Diagram" to="Compound" arity="MM" converse="IsContainedIn">
      <DisplayInfo theme="web" fixed="1" caption="Is\nContained\nIn" row="2" col="3.5"/>
      <Notes>This relationship indicates that a compound appears on a particular diagram.
            The same compound can appear on many diagrams, and a diagram always contains many
            compounds.</Notes>
    </Relationship>
    <Relationship name="IsContainedIn" from="Feature" to="MachineRole" arity="MM" converse="Contains">
      <DisplayInfo theme="seed" caption="Is\nContained\nIn" row="8" col="5"/>
      <Notes>This relationship connects a machine role to the features that occur in it. A feature
    may occur in many machine roles and a machine role may contain many features. The subsystem
    annotation process is essentially the maintenance of this relationship.</Notes>
    </Relationship>
    <Relationship name="IsRoleOf" from="Role" to="MachineRole" arity="1M" converse="HasRole">
      <DisplayInfo caption="Is\nRole\nOf" theme="seed"/>
      <Notes>This relationship connects a role to the machine roles that represent its
      appearance in a molecular machine. A machine role has exactly one associated role,
      but a role may be represented by many machine roles.</Notes>
    </Relationship>
    <Relationship name="IsTerminusFor" from="Compound" to="Scenario" arity="MM" converse="HasAsTerminus">
      <DisplayInfo theme="web" caption="Is\nTerminus\nFor"/>
      <Notes>A terminus for a scenario is a compound that acts as its input or output. A
            compound can be the terminus for many scenarios, and a scenario will have many termini.
            The relationship attributes indicate whether the compound is an input to the scenario or
            an output.</Notes>
      <Fields>
        <Field name="group-number" type="int">
          <Notes>The group number is 0 for an input compound; 1, for an output compound, and 2 for
                    an auxiliary compound. An ancillary compound is one that is produced by the
                    scenario, but is not the primary output.</Notes>
        </Field>
      </Fields>
      <ToIndex>
        <Notes>This index presents the terminal compounds for a scenario in group
                order.</Notes>
        <IndexFields>
          <IndexField name="group-number" order="ascending"/>
        </IndexFields>
      </ToIndex>
    </Relationship>
    <Relationship name="Exposes" from="ProteinSequence" to="Structure" arity="MM" converse="IsExposedBy">
      <DisplayInfo theme="web" fixed="1" row="7" col="2" caption="Is\nExposed\nBy"/>
      <Notes>This relationship connects a protein sequence to its structural representations. It is a
  many-to-many relationship. Note that only some protein sequences have known structural representations.</Notes>
    </Relationship>
    <Relationship name="IsSubInstanceOf" from="Subsystem" to="Scenario" arity="1M" converse="Validates">
      <DisplayInfo theme="seed" caption="Is Part\nInstance\nOf" fixed="1" row="1" col="7"/>
      <Notes>This relationship connects a scenario to its subsystem it validates. A scenario
            belongs to exactly one subsystem, but a subsystem may have multiple scenarios.</Notes>
    </Relationship>
    <Relationship name="Overlaps" from="Scenario" to="Diagram" arity="MM" converse="IncludesPartOf">
      <DisplayInfo theme="web" fixed="1" row="2" col="5.5"/>
      <Notes>A Scenario overlaps a diagram when the diagram displays a portion of the reactions
            that make up the scenario. A scenario may overlap many diagrams, and a diagram may
            be include portions of many scenarios.</Notes>
    </Relationship>
    <Relationship name="HasParticipant" from="Scenario" to="Reaction" arity="MM" converse="ParticipatesIn">
      <DisplayInfo theme="web" caption="Has\nParticipant" row="2" col="4.5" fixed="1"/>
      <Notes>A scenario consists of many participant reactions that convert the input compounds
            to output compounds. A single reaction may participate in many scenarios.</Notes>
      <Fields>
        <Field name="type" type="int">
          <Notes>Indicates the type of participaton. If 0, the reaction is in the main pathway of
      the scenario. If 1, the reaction is necessary to make the model work but is not in the
      subsystem. If 2, the reaction is part of the subsystem but should not be included in
      the modelling process.</Notes>
        </Field>
      </Fields>
      <FromIndex>
        <Notes>This index presents the reactions in the scenario in order from
most important to least important.</Notes>
        <IndexFields>
          <IndexField name="type" order="ascending"/>
        </IndexFields>
      </FromIndex>
    </Relationship>
    <Relationship name="IsInPair" from="Feature" to="Pairing" arity="MM" converse="Contains">
      <DisplayInfo theme="seed" caption="Is In\nPair"/>
      <Notes>A pairing contains exactly two protein sequences. A protein sequence can
             belong to multiple pairings. When going from a protein sequence to its pairings,
             they are presented in alphabetical order by sequence key.</Notes>
    </Relationship>
    <Relationship name="Concerns" from="Publication" to="ProteinSequence" arity="MM" converse="IsATopicOf">
      <DisplayInfo theme="web" row="8" col="2" caption="Is A\nTopic\nOf" fixed="1"/>
      <Notes>This relationship connects a publication to the protein sequences it
            describes.</Notes>
    </Relationship>
    <Relationship name="IsTaxonomyOf" to="Genome" from="TaxonomicGrouping" arity="1M" converse="IsInTaxa">
      <DisplayInfo theme="nmpdr" fixed="1" caption="Is In\nTaxa" row="9" col="8"/>
      <Notes>A genome belongs to exactly one taxonomic grouping. A taxonomic grouping
  contains many genomes. Some taxonomic groupings do not contain any genomes. These 
  in fact contain other taxonomic groups.</Notes>
    </Relationship>
    <Relationship name="IsMachineOf" from="MolecularMachine" to="MachineRole" arity="1M" converse="IsRoleOf">
      <DisplayInfo caption="Is\nMachine\nOf" theme="seed"/>
      <Notes>This relationship connects a molecular machine to its various machine roles.
      Each machine has many machine roles, but each machine role belongs to only one machine.</Notes>
    </Relationship>
    <Relationship name="IsSequenceFor" from="ProteinSequence" to="Identifier" arity="1M" converse="IsFeatureFor">
      <DisplayInfo caption="Is\nSequence\nFor" theme="seed"/>
      <Notes>This relationship connects a peg identifier to the protein sequence it produces (if any).
            Only peg identifiers participate in this relationship. Identifiers that name RNAs,
            operons, or other non-protein feature do not connect to protein sequences. A single
            protein sequence will frequently have many identifiers.</Notes>
    </Relationship>
    <Relationship name="IncludesIdentifier" from="IdentifierSet" to="Identifier" arity="1M" converse="IsIncludedInSet">
      <DisplayInfo theme="seed" caption="Includes" row="9.5" col="1.5"/>
      <Notes>An identifier set contains many identifiers. If the set identifies a feature, then one of the identifiers 
  will be a feature ID. If the set identifies a protein sequence, then one of the identifiers will be the
  MD5 hash key for the protein sequence.</Notes>
    </Relationship>
  </Relationships>
  <Shapes>
    <Shape type="diamond" name="ConsistsOf" from="Variant" to="Role">
      <DisplayInfo theme="neutral" caption="Belongs To" connected="1"/>
      <Notes>This relationship is not physically implemented in the database. It is
      implicit in the data for a variant. A variant contains a boolean expression that
      describes the various combinations of roles it can contain.</Notes>
    </Shape>
    <Shape type="diamond" name="IsIdentifiedBy" from="Feature" to="Identifier">
      <DisplayInfo theme="neutral" caption="Identifies" connected="1"/>
      <Notes>This relationship is not physically implemented in the database. It is
      implicit in the data for an identifier. If the identifiers is a FIG feature
      ID, then it identifies that feature, as do all other identifiers in the same
      identifier set.</Notes>
    </Shape>
  </Shapes>
</Database>

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3