[Bio] / Sprout / SaplingDBD.xml Repository:
ViewVC logotype

View of /Sprout/SaplingDBD.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (download) (as text) (annotate)
Tue Jul 8 08:56:54 2008 UTC (11 years, 4 months ago) by parrello
Branch: MAIN
CVS Tags: rast_rel_2008_07_21, mgrast_rel_2008_0806, rast_rel_2008_08_07
Initial Sapling DBD.

<?xml version="1.0" encoding="utf-8" ?>
<Database>
  <Title>Sapling Bioinformatics Database</Title>
  <Notes>The Sapling database is a distributable, self-contained copy of the NMPDR data.
    Unlike Sprout, which is optimized for searching, Sapling is designed to be structurally
    simple without sacrificing the ability to find information quickly.</Notes>
  <Issues>
    <Issue>Must add the new "image" data type to ERDB.</Issue>
    <Issue>Must add the new "dna" data type to ERDB.</Issue>
    <Issue>Diagrammer should be able to read real DBDs.</Issue>
    <Issue>Diagrammer should allow editing the DBD.</Issue>
    <Issue>Must add back the ability to index a secondary relation. Note that
            such indexes can only have a single field.</Issue>
    <Issue>We probably need some type tables that describe things like Identifier(source)
            or Family(kind).</Issue>
    <Issue>I'm operating on the assumption that this database will eventually grow into a
            successor for Sprout, hence the name "Sapling". If I'm wrong, then it should be
            renamed "Root".</Issue>
    <Issue>The ERDB documentation needs to be updated to include DisplayInfo, Asides,
            the "converse" attribute for relationships, and the Shapes section.</Issue>
  </Issues>
  <Entities>
    <Entity name="Scenario" keyType="string">
      <DisplayInfo theme="web" col="5" row="1"/>
      <Notes>A scenario is used to verify the validity of subsystem assignments. Each
            scenario converrts input compounds to output compounds using reactions.
            The scenario may use all of the reactions controlled by a subsystem or only
            some, and may also incorporate additional reactions.</Notes>
    </Entity>
    <Entity name="Compound" keyType="name-string">
      <DisplayInfo theme="web" col="1" row="3"/>
      <Notes>A compound is a chemical that participates in a reaction.
            All compounds have a unique ID and may also have one or more names. Both
            ligands and reaction components are treated as compounds.</Notes>
      <Fields>
        <Field name="label" type="string">
          <Notes>Primary name of the compound. This is the name used in reaction
                    display strings.</Notes>
        </Field>
        <Field name="name" type="string" relation="CompoundName">
          <Notes>Alternate name for the compound. A compound may have many
                    alternate names. The primary name should also be one of the
                    alternate names.</Notes>
        </Field>
        <Field name="cas-id" type="string" relation="CompoundCAS">
          <Notes>The Chemical Abstract Service ID for the compound. A
                    compound may have at most one CAS ID.</Notes>
        </Field>
        <Field name="zinc-id" type="string" relation="CompoundZinc">
          <Notes>The ZINC database ID for the compound. A compound may
                    have at most one ZINC ID.</Notes>
        </Field>
      </Fields>
      <Indexes>
        <Index>
          <Notes>This index allows searching for compounds by name.</Notes>
          <IndexFields>
            <IndexField name="name" order="ascending"/>
          </IndexFields>
        </Index>
        <Index>
          <Notes>This index allows searching for compounds by CAS ID.</Notes>
          <IndexFields>
            <IndexField name="cas-id" order="ascending"/>
          </IndexFields>
        </Index>
        <Index>
          <Notes>This index allows searching for compounds by ZINC ID.</Notes>
          <IndexFields>
            <IndexField name="zinc-id" order="ascending"/>
          </IndexFields>
        </Index>
      </Indexes>
    </Entity>
    <Entity name="Diagram" keyType="name-string">
      <DisplayInfo theme="web" col="3" row="3"/>
      <Notes>A functional diagram describes a network of chemical reactions, often comprising a single
            subsystem. A diagram is identified by a short name and contains a longer descriptive name.</Notes>
      <Fields>
        <Field name="name" type="text">
          <Notes>Descriptive name of this diagram.</Notes>
        </Field>
        <Field name="content" type="image" relation="DiagramContent">
          <Notes>The content of the diagram, in PNG format encoded as base 64 MIME.</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="Reaction" keyType="key-string">
      <DisplayInfo theme="web" col="5" row="3"/>
      <Notes>A reaction is a chemical process that converts one set of compounds (substrate)
            to another set (products). The reaction ID is generally a small number preceded by a
            letter.</Notes>
      <Fields>
        <Field name="url" type="string" relation="ReactionURL">
          <Notes>HTML string containing a link to a web location that describes the
                    reaction. This field is optional.</Notes>
        </Field>
        <Field name="rev" type="boolean">
          <Notes>TRUE if this reaction is reversible, else FALSE</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="Subsystem" keyType="id-string">
      <DisplayInfo theme="seed" col="7" row="3"/>
      <Notes>A subsystem is a collection of roles that work together in a cell. Identification of subsystems
            is an important tool for recognizing parallel genetic features in different organisms. The key
            is an alphanumeric code string.</Notes>
      <Fields>
        <Field name="name" type="string">
          <Notes>Displayable name of this subsystem.</Notes>
        </Field>
        <Field name="version" type="int">
          <Notes>Version number for the subsystem. This value is incremented each time the subsystem
                    is backed up.</Notes>
        </Field>
        <Field name="curator" type="string">
          <Notes>Name of the person currently in charge of the subsystem.</Notes>
        </Field>
        <Field name="notes" type="text">
          <Notes>Descriptive notes about the subsystem.</Notes>
        </Field>
        <Field name="description" type="text">
          <Notes>Description of the subsystem's function in the cell.</Notes>
        </Field>
        <Field name="classification" type="string">
          <Notes>Classification string, colon-delimited. This string organizes the
                    subsystems into a hierarchy.</Notes>
        </Field>
      </Fields>
      <Indexes>
        <Index>
          <Notes>This index is used to get the subsystems in hierarchical order.</Notes>
          <IndexFields>
            <IndexField name="classification" order="ascending"/>
          </IndexFields>
        </Index>
        <Index>
          <Notes>This index is used to get the subsystem by name.</Notes>
          <IndexFields>
            <IndexField name="name" order="ascending"/>
          </IndexFields>
        </Index>
      </Indexes>
    </Entity>
    <Entity name="Publication" keyType="hash-string">
      <DisplayInfo theme="web" col="1" row="7"/>
      <Notes>A _publication_ is an article or citation that may be used as evidence for
            assertions made in the database. The key is a hash code computed from the URL.</Notes>
      <Fields>
        <Field name="url" type="string">
          <Notes>URL of the article or of its citation.</Notes>
        </Field>
        <Field name="citation" type="text">
          <Notes>Citation string for the article.</Notes>
        </Field>
      </Fields>
      <Indexes>
        <Index>
          <Notes>This index allows searching for the article by the author names and title.</Notes>
          <IndexFields>
            <IndexField name="citation" order="ascending"/>
          </IndexFields>
        </Index>
      </Indexes>
    </Entity>
    <Entity name="EC" keyType="key-string">
      <DisplayInfo theme="web" col="3" row="5"/>
      <Notes>An EC number is a code number associated with one or more particular roles.
            EC numbers are a useful tool for identifying corresponding roles in different
            databases.</Notes>
    </Entity>
    <Entity name="Role" keyType="string">
      <DisplayInfo theme="web" col="5" row="5"/>
      <Notes>A role describes a biological function that may be fulfilled by a feature.
            One of the main goals of the database is to assign features to roles. Most
            roles are effected by the construction of proteins. Some, however, deal with
            functional regulation and message transmission</Notes>
      <Fields>
        <Field name="hypothetical" type="boolean">
          <Notes>TRUE if a role is hypothetical, else FALSE</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="Variant" keyType="hash-string">
      <DisplayInfo theme="seed" col="7" row="5"/>
      <Notes>A variant is a functional subset of a subsystem. It indicates the particular
            sequence of roles used to implement a metabolic pathway. Variants are abstract
            concepts used to classify machines. The key of the variant is the subsystem ID followed
            by the variant code (usually a numeric string with zero or more decimal points).</Notes>
    </Entity>
    <Entity name="Structure" keyType="string">
      <DisplayInfo theme="web" col="1" row="5"/>
      <Notes>A structure represents a portion of a protein's surface. Structures are used
            to assist in understanding which reactions a protein catalyzes and why. The key of a
            structure is its type followed by an ID. The current types are PDB and CDD, though
            additional types may be added at a later date.</Notes>
    </Entity>
    <Entity name="ProteinSequence" keyType="hash-string">
      <DisplayInfo theme="web" col="3" row="7" caption="Protein Sequence"/>
      <Notes>A protein sequence is a specific sequence of amino acids. Unlike a DNA sequence, a
            protein sequence does not belong to a genome. Identical proteins generated by different
            genomes are generally stored as a single ProteinSequence instance. The key is a
            hash of the protein letter sequence.</Notes>
      <Fields>
        <Field name="sequence" type="dna">
          <Notes>The sequence contains the letters corresponding to the protein's
                    amino acids.</Notes>
        </Field>
        <Field name="iedb" type="text" relation="ProteinSequenceIEDB" special="property_search">
          <Notes>A value indicating whether or not the feature can be found in the
                    Immune Epitope Database. If the feature has not been matched to that database,
                    this field will have no values. Otherwise, it will have an epitope name and/or
                    sequence, hyperlinked to the database.</Notes>
        </Field>
        <Field name="signal-peptide" type="name-string">
          <Notes>The signal peptide location for this feature. This is expressed as start and end
                    numbers with a hyphen for the relevant amino acids. So, "1-22" would indicate a signal
                    peptide at the beginning of the feature's protein and extending through 22 amino acid
                    positions. An empty string means no signal peptide is present.</Notes>
        </Field>
        <Field name="transmembrane-map" type="text">
          <Notes>A map indicating which sections of a protein will be embedded in a membrane.
                    This is expressed as a comma-separated list of as start and end numbers with hyphens
                    for the relevant amino acids. So, "10-12, 40-60" would indicate that there are two
                    sections of the protein that become embedded in a membrane: the 10th through 12th
                    amino acids, and the 40th through the 60th. An empty string means no
                    transmembrane regions are known.</Notes>
        </Field>
        <Field name="similar-to-human" type="boolean">
          <Notes>TRUE if this feature generates a protein that is similar to one found in humans,
                    else FALSE</Notes>
        </Field>
        <Field name="isoelectric-point" type="float">
          <Notes>pH in the surrounding medium at which the charge on a protein is neutral.
                    If the pH of the medium is lower than this value, the protein will have a net
                    positive charge. If the pH of the medium is higher, then the protein will have a
                    net negative charge.</Notes>
        </Field>
        <Field name="molecular-weight" type="float">
          <Notes>Molecular weight of this feature's protein, in daltons. A weight of 0
                    indicates that no protein is created.</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="Feature" keyType="id-string">
      <DisplayInfo theme="seed" col="5" row="9"/>
      <Notes>A feature (sometimes also called a gene) is a part of a genome that is of special
            interest. Features may be spread across multiple DNA sequences (contigs) of a genome, but
            never across more than one genome. Each feature in the database has a unique FIG ID.</Notes>
      <Fields>
        <Field name="feature-type" type="id-string">
          <Notes>Code indicating the type of this feature. Among the codes currently
                    supported are "peg" for a protein encoding gene, "bs" for a
                    binding site, "opr" for an operon, and so forth.</Notes>
        </Field>
        <Field name="link" type="text" relation="FeatureLink">
          <Notes>Web hyperlink for this feature. A feature can have no hyperlinks or it can have many. The
                    links are to other websites that have useful about the gene that the feature represents, and
                    are coded as raw HTML, using an anchor href tag.</Notes>
        </Field>
        <Field name="essential" type="text" relation="FeatureEssential" special="property_search">
          <Notes>A value indicating the essentiality of the feature, coded as HTML. In most
                    cases, this will be a word describing whether the essentiality is confirmed (essential)
                    or potential (potential-essential), hyperlinked to the document from which the
                    essentiality was curated. If a feature is not essential, this field will have no
                    values; otherwise, it may have multiple values.</Notes>
        </Field>
        <Field name="virulent" type="text" relation="FeatureVirulent" special="property_search">
          <Notes>A value indicating the virulence of the feature, coded as HTML. In most
                    cases, this will be a phrase or SA number hyperlinked to the document from which
                    the virulence information was curated. If the feature is not virulent, this field
                    will have no values; otherwise, it may have multiple values.</Notes>
        </Field>
        <Field name="sequence-length" type="counter">
          <Notes>Number of base pairs in this feature.</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="Machine" keyType="key-string">
      <DisplayInfo theme="seed" col="7" row="7"/>
      <Notes>A machine is a collection of features that implements a metabolic pathway. Machines
            are the physical instances of variants. Each machine corresponds to a row in a subsystem
            spreadsheet. The key is the variant key followed by a colon and the Genome ID.</Notes>
      <Fields>
        <Field name="type" type="key-string">
          <Notes>The machine type indicates how it relates to the parent variant. A type
                    of "vacant" means that the machine does not appear to actually exist in the
                    organism. A type of "incomplete" means that the machine appears to be missing
                    many reactions. In all other cases, the type is "normal".</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="Identifier" keyType="string">
      <DisplayInfo theme="seed" col="4" row="10"/>
      <Notes>An identifier is an alternate name for a feature.</Notes>
      <Fields>
        <Field name="source" type="key-string">
          <Notes>Specific type of the identifier, such as its source database or category.
                    The type can usually be decoded to convert the identifier to a URL.</Notes>
        </Field>
      </Fields>
      <Indexes>
        <Index>
          <Notes>This index allows all the identifiers of a specified type to be located.</Notes>
          <IndexFields>
            <IndexField name="source" order="ascending"/>
          </IndexFields>
        </Index>
      </Indexes>
    </Entity>
    <Entity name="Assignment" keyType="hash-string">
      <DisplayInfo col="5" row="7" theme="seed"/>
      <Notes>An assignment connects a feature to its putative role. The key of the
            assignment is the feature ID followed by a timestamp.</Notes>
    </Entity>
    <Entity name="EvidenceClass" keyType="name-string">
      <DisplayInfo col="6" row="9" theme="seed"/>
      <Notes>An evidence class describes a general type of evidence code. An actual evidence
            code consists of its class (e.g. "dlit", "ff") and an optional modifier. The modifier
            is contained in the relationship between the class and the target assignment.</Notes>
      <Fields>
        <Field name="format" type="string">
          <Notes>The format string is an example showing how the modifier portion of the
                    evidence code is formatted. It may contain HTML markup.</Notes>
        </Field>
        <Field name="short-description" type="string">
          <Notes>The short description is a brief noun phrase explanation of the
                    evidence class.</Notes>
        </Field>
        <Field name="description" type="text">
          <Notes>The description is a long text description of the evidence class and its
                    format string.</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="Family" keyType="name-string">
      <DisplayInfo theme="seed" col="5" row="11"/>
      <Notes>A family is a group of features united by a particular determination algorithm.
            The algorithm will frequently-- but not always-- signify a functional role.</Notes>
    </Entity>
    <Entity name="Genome" keyType="name-string">
      <DisplayInfo theme="nmpdr" col="7" row="9" caption="Genome Organism"/>
      <Notes>Genome objects are organized in a hierarchy. At the bottom are the true genomes and
            meta-genomes that connect to the rest of the database. Above them are a hierarchy
            based on taxonomic classification.</Notes>
      <Fields>
        <Field name="full-name" type="name-string">
          <Notes>Full name of the genome. This is either the taxonomic classification name
                    or a genus/species/strain name.</Notes>
        </Field>
        <Field name="level" type="int">
          <Notes>Taxonomic classification level. A level of 0 indicates that this is
                    a specific strain with DNA attached. Higher levels indicate progressively
                    larger classifications. Each level number represents a specific type of
                    classification. Sub-species is always 1, species is always 2, genus is always
                    3, and so forth, up to 99 for domain. This means that as you travel up the
                    taxonomy tree, the ranks will be non-sequential.</Notes>
        </Field>
        <Field name="domain" type="name-string">
          <Notes>Domain for this genome or taxonomic classification. The domain is
                    the highest level of the taxonomy tree.</Notes>
        </Field>
        <Field name="version" type="name-string">
          <Notes>Version string for this genome, generally consisting of the genome ID followed
                    by a period and a string of digits.</Notes>
        </Field>
        <Field name="complete" type="boolean">
          <Notes>TRUE if the genome is complete, else FALSE</Notes>
        </Field>
        <Field name="dna-size" type="counter">
          <Notes>number of base pairs in the genome</Notes>
        </Field>
        <Field name="primary-group" type="name-string">
          <Notes>The primary NMPDR group for this organism. There is always exactly one NMPDR
                    group per organism. An empty string indicates the organism is supporting. In general,
                    more data is kept on organisms in NMPDR groups than on supporting organisms.</Notes>
        </Field>
        <Field name="contigs" type="int">
          <Notes>Number of contigs for this organism.</Notes>
        </Field>
        <Field name="pegs" type="int">
          <Notes>Number of protein encoding genes for this organism</Notes>
        </Field>
        <Field name="rnas" type="int">
          <Notes>Number of RNA features found for this organism.</Notes>
        </Field>
      </Fields>
      <Indexes>
        <Index>
          <Notes>This index allows the applications to find all genomes associated with
                    a specific primary (NMPDR) group.</Notes>
          <IndexFields>
            <IndexField name="primary-group" order="ascending"/>
            <IndexField name="full-name" order="ascending"/>
          </IndexFields>
        </Index>
        <Index>
          <Notes>This index allows the applications to find all genomes in lexical
                    order by name. Organisms will show up first, alphabetical by species and
                    strain name, followed by the various taxonomic classifications grouped by
                    increasing inclusivity. (In other words,</Notes>
          <IndexFields>
            <IndexField name="level" order="ascending"/>
            <IndexField name="full-name" order="ascending"/>
          </IndexFields>
        </Index>
      </Indexes>
    </Entity>
    <Entity name="Pairing" keyType="name-string">
      <DisplayInfo theme="seed" col="3" row="9"/>
      <Notes>A pairing indicates that two protein sequences are found close together on one or
            more DNA sequences. Not all possible pairings are stored in the database; only those that
            are considered for some reason to be significant for annotation purposes. The pairing
            includes a score that indicates how many of the DNA sequences are significantly
            dissimilar. A higher score indicates a stronger pairing. The key of the pairing is the
            concatenation of the protein sequence keys in alphabetical order.</Notes>
      <Asides>Because the protein sequence key is a hash of the sequence letters, the key of a pairing between two
            sequences is computable from the sequences themselves. Theoretically, the pairing
            is unordered: (A,B) and (B,A) are the same pairing. It is frequently the case,
            however, that we need to refer to the "first" or "second" protein in the pairing.
            When this happens, the first one is always the protein with the alphabetically
            lesser key. The IsInPair relationship automatically shows the proteins in this
            order.</Asides>
      <Fields>
        <Field name="score" type="int">
          <Notes>Coupling score for this pairing. A higher score indicates a stronger
                    coupling.</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="EvidenceSet" keyType="int">
      <DisplayInfo theme="seed" col="3" row="11" caption="Evidence Set"/>
      <Notes>An evidence set indicates evidence for a functional connection between protein
            sequence pairs. The protein sequences possessing the connection are the ones that
            participate in the evidence set's pairings.</Notes>
      <Asides>The pairings for a particular evidence set
            will contain protein sequences that are significantly similar. In other words, if
            (A,B) and (X,Y) are both pairings in a single evidence set, then (A =~ X) and
            (B =~ Y) or (A =~ Y) and (B =~ X).</Asides>
      <Fields>
        <Field name="score" type="int">
          <Notes>Score for this evidence set. The score indicates the number of
                    significantly different genomes represented by the pairings.</Notes>
        </Field>
      </Fields>
    </Entity>
    <Entity name="DnaSequence" keyType="name-string">
      <DisplayInfo theme="nmpdr" col="7" row="11" caption="DNA Sequence"/>
      <Notes>A DNA sequence (sometimes called a "contig") is a contiguous sequence of base pairs
            belonging to a single genome. The key of the DNA sequence is the genome ID followed by
            the contig ID.</Notes>
      <Fields>
        <Field name="length" type="counter">
          <Notes>Number of base pairs in the DNA sequence.</Notes>
        </Field>
        <Field name="bases" type="text" relation="DnaSequenceBases">
          <Notes>A string of letters representing the nucleotides of the sequence.</Notes>
        </Field>
      </Fields>
    </Entity>
  </Entities>
  <Relationships>
    <Relationship name="IsTargetOf" from="Role" to="Assignment" arity="1M" converse="Targets">
      <DisplayInfo theme="seed" caption="Is\nTarget\nOf"/>
      <Notes>This relationship connects an assignment to the target role. A role has
            many assignments, but an assignment targets exactly one role.</Notes>
    </Relationship>
    <Relationship name="IsAnnotatedBy" from="Feature" to="Assignment" arity="1M" converse="Annotates">
      <DisplayInfo theme="seed" caption="Annotates"/>
      <Notes>This relationship connects a feature to the assignments that annotate it.
            A feature may have several assignments, but an assignment annotates exactly one
            feature.</Notes>
      <Fields>
        <Field name="time-stamp" type="date">
          <Notes>Time at which the assignment was made.</Notes>
        </Field>
        <Field name="annotator" type="string">
          <Notes>Name of the annotator who made the assignment.</Notes>
        </Field>
        <Field name="active" type="boolean">
          <Notes>TRUE if this assignment is active; FALSE if it has been
                    superceded.</Notes>
        </Field>
      </Fields>
      <FromIndex>
        <Notes>This index presents the assignments in order from the most
                recent to the least recent, with active assignments first.</Notes>
        <IndexFields>
          <IndexField name="active" order="descending"/>
          <IndexField name="time-stamp" order="descending"/>
        </IndexFields>
      </FromIndex>
    </Relationship>
    <Relationship name="IsEvidencedBy" from="Assignment" to="EvidenceClass" arity="MM" converse="IsEvidenceFor">
      <DisplayInfo theme="seed" caption="Is\nEvidenced\nBy" fixed="1" col="6" row="8" />
      <Notes>This relationship contains the evidence for an assignment. An assignment will
            have one or more evidence codes, and each evidence class will justify an enormous
            number of assignments. The intersection data contains details about the evidence.</Notes>
      <Fields>
        <Field name="modifier" type="string">
          <Notes>A modifier for the evidence class. The modifier is concatenated to the
                    class to form the complete evidence code. Frequently, the modifier will be the
                    ID of a family, subsystem, or evidence set.</Notes>
        </Field>
      </Fields>
    </Relationship>
    <Relationship name="IsTerminusFor" from="Compound" to="Scenario" arity="MM" converse="HasAsTerminus">
      <DisplayInfo caption="Has As\nTerminus"/>
      <Notes>A terminus for a scenario is a compound that acts as its input or output. A compound
            can be the terminus for many scenarios, and a scenario will have many termini. The relationship
            attributes indicate whether the compound is an input to the scenario or an output. In some
            cases, there may be multiple alternative output groups. This is also indicated by the
            attributes.</Notes>
      <Fields>
        <Field name="group-number" type="int">
          <Notes>If zero, then the compound is an input. Otherwise, this is the index number
                    of the output group. Each output group represents an alternative set of output
                    compounds.</Notes>
        </Field>
      </Fields>
      <ToIndex>
        <Notes>This index allows the application to view a scenario's compounds by group.</Notes>
        <IndexFields>
          <IndexField name="group-number" type="int"/>
        </IndexFields>
      </ToIndex>
    </Relationship>
    <Relationship name="HasAlias" from="Feature" to="Identifier" arity="MM" converse="IsAliasOf">
      <DisplayInfo theme="seed" fixed="1" col="4" row="9" caption="Has Alias"/>
      <Notes>An identifier is an alias for multiple features. A feature may have multiple alias
            identifiers.</Notes>
    </Relationship>
    <Relationship name="Justifies" from="EvidenceSet" to="Family" arity="MM" converse="IsJustifiedBy">
      <DisplayInfo theme="seed" caption="Is\nJustified\nBy"/>
      <Notes>A family may use multiple sets as evidence. In general, an evidence set will
            justify two families-- one for each side of the pairing.</Notes>
    </Relationship>
    <Relationship name="IsDeterminedBy" from="EvidenceSet" to="Pairing" arity="MM" converse="Determines">
      <DisplayInfo theme="seed" caption="Determines"/>
      <Notes>An evidence set exists because it has pairings in it, and this relationship
            connects the evidence set to its constituent pairings. A pairing cam belong to
            multiple evidence sets.</Notes>
      <Fields>
        <Field name="inverted" type="boolean">
          <Notes>A pairing is an unordered pair of protein sequences, but its
                    similarity to other pairings in an evidence set is ordered. Let (A,B) be
                    a pairing and (X,Y) be another pairing in the same set. If this flag is
                    FALSE, then (A =~ X) and (B =~ Y). If this flag is TRUE, then (A =~ Y) and
                    (B =~ X).</Notes>
        </Field>
      </Fields>
    </Relationship>
    <Relationship name="IsInPair" from="ProteinSequence" to="Pairing" arity="MM" converse="Contains">
      <DisplayInfo theme="seed" caption="Is In\nPair"/>
      <Notes>A pairing contains exactly two protein sequences. A protein sequence can
            belong to multiple pairings. When going from a protein sequence to its pairings,
            they are presented in alphabetical order by sequence key.</Notes>
    </Relationship>
    <Relationship name="HasMember" from="Family" to="Feature" arity="1M" converse="IsMemberOf">
      <DisplayInfo theme="seed" caption="Is\nMember\nOf" row="11.5" col="5"/>
      <Notes>This relationship connects each feature family to its constituent
            features. A family always has many features, but a single feature can
            be found in at most one family.</Notes>
    </Relationship>
    <Relationship name="IsClassOf" from="Genome" to="Genome" arity="1M" converse="IsClassifiedAs">
      <DisplayInfo theme="nmpdr" col="8" row="9" fixed="1" caption="Is\nClass\nOf"/>
      <Notes>The recursive IsClassOf relationship organizes Genomes into a hierarchy
            based on the standard taxonomy. Only genomes at the bottom of the hierarchy have
            actual DNA attached.</Notes>
    </Relationship>
    <Relationship name="ConsistsOf" from="Variant" to="Role" arity="MM">
      <DisplayInfo theme="seed" connected="1" caption="Belongs To"/>
      <Notes>A variant is essentially a sequence of roles. Roles can belong to many
            variants. Some roles will not belong to any variants.</Notes>
    </Relationship>
    <Relationship name="Contains" from="Diagram" to="Compound" arity="MM" converse="IsContainedIn">
      <DisplayInfo theme="web" caption="Is\nContained\nIn"/>
      <Notes>This relationship indicates that a compound appears on a particular diagram.
            The same compound can appear on many diagrams, and a diagram always contains many
            compounds.</Notes>
    </Relationship>
    <Relationship name="Includes" from="Subsystem" to="Role" arity="MM" converse="IsIncludedIn">
      <DisplayInfo theme="seed" caption="Includes"/>
      <Notes>A subsystem is defined by its roles. The subsystem's variants contain slightly
            different sets of roles, but all of the roles in a variant must be connected to the
            parent subsystem by this relationship.</Notes>
      <Fields>
        <Field name="sequence" type="counter">
          <Notes>Sequence number of the role within the subsystem. When the roles
                    are formed into a variant, they will generally appear in sequence order.</Notes>
        </Field>
      </Fields>
      <FromIndex>
        <Notes>This index insures that the roles of the subsystem are presented in sequence
                order.</Notes>
        <IndexFields>
          <IndexField name="sequence" order="ascending"/>
        </IndexFields>
      </FromIndex>
    </Relationship>
    <Relationship name="Describes" from="Subsystem" to="Variant" arity="1M" converse="IsDescribedBy">
      <DisplayInfo theme="seed"/>
      <Notes>This relationship connects a subsystem to the individual variants used
            to implement it. Each variant contains a slightly different subset of the
            roles in the parent subsystem.</Notes>
    </Relationship>
    <Relationship name="Shows" from="Diagram" to="Reaction" arity="MM" converse="IsShowedOn">
      <DisplayInfo theme="web"/>
      <Notes>This relationship connects a diagram to its reactions. A diagram shows multiple
            reactions, and a reaction can be on many diagrams.</Notes>
    </Relationship>
    <Relationship name="Performs" theme="web" from="Reaction" to="Role" arity="MM">
      <DisplayInfo theme="web"/>
      <Notes>A reaction performs many roles. A role can be performed by many
            reactions.</Notes>
    </Relationship>
    <Relationship name="IsImplementedBy" from="Variant" to="Machine" arity="1M" converse="Implements">
      <DisplayInfo theme="seed" caption="Is\nImplemented\nBy"/>
      <Notes>This relationship connects a variant to the physical machines that implement
            it in the genomes. A variant is implemented by many machines, but a machine belongs to
            only one variant.</Notes>
    </Relationship>
    <Relationship name="Involves" from="Reaction" to="Compound" arity="MM" converse="IsInvolvedIn">
      <DisplayInfo theme="web" col="3" row="4" fixed="1" caption="Is\nInvolved\nIn"/>
      <Notes>This relationship connects a reaction to the compounds that participate in
            it. A reaction involves many compounds, and a compound can be involved in many reactions.
            The relationship attributes indicate whether a compound is a product or substrate of the
            reaction, as well as its stoichiometry.</Notes>
      <Fields>
        <Field name="product" type="boolean">
          <Notes>TRUE if the compound is a product of the reaction, FALSE if
                    it is a substrate. When a reaction is written on paper in
                    chemical notation, the substrates are left of the arrow and the
                    products are to the right. Sorting on this field will cause
                    the substrates to appear first, followed by the products. If the
                    reaction is reversible, then the notion of substrates and products
                    is not intuitive; however, a value here of FALSE still puts the
                    compound left of the arrow and a value of TRUE still puts it to the
                    right.</Notes>
        </Field>
        <Field name="stoichiometry" type="key-string">
          <Notes>Number of molecules of the compound that participate in a
                    single instance of the reaction. For example, if a reaction
                    produces two water molecules, the stoichiometry of water for the
                    reaction would be two. When a reaction is written on paper in
                    chemical notation, the stoichiometry is the number next to the
                    chemical formula of the compound.</Notes>
        </Field>
        <Field name="main" type="boolean">
          <Notes>TRUE if this compound is one of the main participants in
                    the reaction, else FALSE. It is permissible for none of the
                    compounds in the reaction to be considered main, in which
                    case this value would be FALSE for all of the relevant
                    compounds.</Notes>
        </Field>
        <Field name="loc" type="key-string">
          <Notes>An optional character string that indicates the relative
                    position of this compound in the reaction's chemical formula. The
                    location affects the way the compounds present as we cross the
                    relationship from the reaction side. The product/substrate flag
                    comes first, then the value of this field, then the main flag.
                    The default value is an empty string; however, the empty string
                    sorts first, so if this field is used, it should probably be
                    used for every compound in the reaction.</Notes>
        </Field>
        <Field name="discriminator" type="int">
          <Notes>A unique ID for this record. The discriminator does not
                    provide any useful data, but it prevents identical records from
                    being collapsed by the SELECT DISTINCT command used by ERDB to
                    retrieve data.</Notes>
        </Field>
      </Fields>
      <ToIndex>
        <Notes>This index presents the compounds in the reaction in the
                order they should be displayed when writing it in chemical notation.
                All the substrates appear before all the products, and within that
                ordering, the main compounds appear first.</Notes>
        <IndexFields>
          <IndexField name="product" order="ascending"/>
          <IndexField name="loc" order="ascending"/>
          <IndexField name="main" order="descending"/>
        </IndexFields>
      </ToIndex>
    </Relationship>
    <Relationship name="IsSourceOf" from="Machine" to="Assignment" arity="1M" converse="HasSource">
      <DisplayInfo theme="seed" caption="Has Source"/>
      <Notes>This relationship connects a machine to the assignments made in its name.
            A machine is the source of many assignments, but an assignment belongs to at most
            one machine.</Notes>
    </Relationship>
    <Relationship name="Uses" theme="seed" from="Genome" to="Machine" arity="1M" converse="IsUsedBy">
      <DisplayInfo theme="seed" caption="Is\nUsed\nBy"/>
      <Notes>This relationship connects a genome to the machines that form its
            metabolic pathways. A genome can use many machines, but a machine is used by exactly
            one genome.</Notes>
    </Relationship>
    <Relationship name="Catalyzes" from="ProteinSequence" to="Role" arity="MM" converse="IsCatalyzedBy">
      <DisplayInfo theme="web" caption="Is\nCatalyzed\nBy"/>
      <Notes>This relationship connects a protein sequence to the functional roles it
            catalyzes in the cell. A protein sequence can catalyze many roles, and a role can
            be catalyzed by many protein sequences. Roles that perform regulatory or message
            transmission functions do not participate in this relationship.</Notes>
    </Relationship>
    <Relationship name="IsProducedBy" from="ProteinSequence" to="Feature" arity="1M" converse="Produces">
      <DisplayInfo caption="Is\nProduced\nBy" theme="seed" row="10" col="1.5"/>
      <Notes>This relationship connects a feature to the protein sequence it produces (if any).
            Many features can produce the same protein sequence, but each feature produces at most
            one protein sequence. Many features do not produce a protein sequence at all.</Notes>
    </Relationship>
    <Relationship name="IsLocatedIn" from="Feature" to="DnaSequence" arity="MM" converse="IsLocusFor">
      <DisplayInfo theme="seed" caption="Is\nLocated\nIn" fixed="1" row="11" col="6" />
      <Notes>A feature is a set of DNA sequence fragments. Most features are a single contiquous
            fragment, so they are located in only one DNA sequence; however, fragments have a maximum
            length, so even a single contiguous feature may participate in this relationship multiple
            times. A few features belong to multiple DNA sequences. In that case, however, all the
            DNA sequences belong to the same genome. A DNA sequence itself will frequently have
            thousands of features connected to it.</Notes>
      <Fields>
        <Field name="locN" type="int">
          <Notes>Sequence number of this segment.</Notes>
        </Field>
        <Field name="beg" type="int">
          <Notes>Index (1-based) of the first residue in the contig that
                    belongs to the segment.</Notes>
        </Field>
        <Field name="len" type="int">
          <Notes>Number of residues in the segment. A length of 0 identifies
                    a specific point between residues. This is the point before the residue if the direction
                    is forward and the point after the residue if the direction is backward.</Notes>
        </Field>
        <Field name="dir" type="char">
          <Notes>Direction of the segment: "+" if it is forward and
                    "-" if it is backward.</Notes>
        </Field>
      </Fields>
      <FromIndex>
        <Notes>This index allows the application to find all the segments of a feature in
                the proper order.</Notes>
        <IndexFields>
          <IndexField name="locN" order="ascending"/>
        </IndexFields>
      </FromIndex>
      <ToIndex>
        <Notes>This index is the one used by applications to find all the feature
                segments that contain a specific residue.</Notes>
        <IndexFields>
          <IndexField name="beg" order="ascending"/>
        </IndexFields>
      </ToIndex>
    </Relationship>
    <Relationship name="IsOwnerOf" from="Genome" to="Feature" arity="1M" converse="IsOwnedBy">
      <DisplayInfo caption="Is\nOwned\nBy" theme="seed" fixed="1" row="10" col="6" />
      <Notes>This relationship connects each feature to its parent genome.</Notes>
    </Relationship>
    <Relationship name="IsMadeUpOf" from="Genome" to="DnaSequence" arity="1M" converse="MakesUp">
      <DisplayInfo theme="nmpdr" caption="Is\nMade Up\nOf"/>
      <Notes>This relationship connects each genome to the DNA sequences that make it up.</Notes>
    </Relationship>
    <Relationship name="Exposes" from="ProteinSequence" to="Structure" arity="MM" converse="IsExposedBy">
      <DisplayInfo theme="web" caption="Is\nExposed\nBy"/>
      <Notes>This relationship connects a protein sequence to the chemically active structures
            on its surface. A protein sequence exposes many structures, and a particular structure
            may occur on many proteins.</Notes>
    </Relationship>
    <Relationship name="Attracts" from="Structure" to="Compound" arity="MM" converse="IsAttractedTo">
      <DisplayInfo theme="web" caption="Is\nAttracted\nTo"/>
      <Notes>This relationship connects a compound to the protein structures that attract it.
            This is an incomplete relationship that exists to service drug targeting queries. Only
            the attractions whose parameters have been determined through modeling or
            experimentation are included. The goal is to determine the docking energy between
            the compound and the protein structure.</Notes>
      <Fields>
        <Field name="reason" type="id-string">
          <Notes>Indication of the reason for determining the docking energy.
                    A value of "Random" indicates the docking was attempted as a part
                    of a random survey used to determine the docking characteristics of a
                    protein structure. A value of "Rich" indicates the docking was attempted
                    because a low-energy docking result was predicted for the compound.</Notes>
        </Field>
        <Field name="tool" type="id-string">
          <Notes>Name of the tool used to compute the docking energy.</Notes>
        </Field>
        <Field name="total-energy" type="float">
          <Notes>Total energy required for the compound to dock with the structure,
                    in kcal/mol. A negative value means energy is released.</Notes>
        </Field>
        <Field name="vanderwalls-energy" type="float">
          <Notes>Docking energy in kcal/mol that results from the geometric fit
                    (Van der Waals force) between the structure and the compound.</Notes>
        </Field>
        <Field name="electrostatic-energy" type="float">
          <Notes>Docking energy in kcal/mol that results from the movement of
                    electrons (electrostatic force) between the structure and the
                    compound.</Notes>
        </Field>
      </Fields>
      <FromIndex>
        <Notes>This index enables the application to view a structure's docking results from
                the lowest energy (best docking) to highest energy (worst docking).</Notes>
        <IndexFields>
          <IndexField name="total-energy" order="ascending"/>
        </IndexFields>
      </FromIndex>
      <ToIndex>
        <Notes>This index enables the application to view a compound's docking results from
                the lowest energy (best docking) to highest energy (worst docking).</Notes>
        <IndexFields>
          <IndexField name="total-energy" order="ascending"/>
        </IndexFields>
      </ToIndex>
    </Relationship>
    <Relationship name="IsTerminusFor" from="Compound" to="Scenario" arity="MM" converse="HasAsTerminus">
      <DisplayInfo theme="web" caption="Has As\nTerminus"/>
      <Notes>A terminus for a scenario is a compound that acts as its input or output. A
            compound can be the terminus for many scenarios, and a scenario will have many termini.
            The relationship attributes indicate whether the compound is an input to the scenario or
            an output. In some cases, there may be multiple alternative output groups. This is also
            indicated by the attributes.</Notes>
      <Fields>
        <Field name="group-number" type="int">
          <Notes>The group number is 0 for an input compound; otherwise, it is the
                    number of the output group to which the compound belongs. Output groups
                    represent alternative outputs for the scenario. A compound in multiple
                    output groups will appear multiple times in this relationship.</Notes>
        </Field>
      </Fields>
      <ToIndex>
        <Notes>This index presents the terminal compounds for a scenario in group
                order.</Notes>
        <IndexFields>
          <IndexField name="group-number" order="ascending"/>
        </IndexFields>
      </ToIndex>
    </Relationship>
    <Relationship name="Overlaps" from="Scenario" to="Diagram" arity="MM" converse="IncludesPartOf">
      <DisplayInfo theme="web"/>
      <Notes>A Scenario overlaps a diagram when the diagram displays a portion of the reactions
            that make up the scenario. A scenario may overlap many diagrams, and a diagram may
            be include portions of many scenarios.</Notes>
    </Relationship>
    <Relationship name="HasParticipant" from="Scenario" to="Reaction" arity="MM" converse="ParticipatesIn">
      <DisplayInfo theme="web" caption="\nParticipates\nIn"/>
      <Notes>A scenario consists of many participant reactions that convert the input compounds
            to output compounds. A single reaction may participate in many scenarios.</Notes>
    </Relationship>
    <Relationship name="IsValidatedBy" from="Subsystem" to="Scenario" arity="1M" converse="Validates">
      <DisplayInfo theme="seed" caption="Is\nValidated\nBy"/>
      <Notes>This relationship connects a scenario to the subsystem it validates. A scenario
            validates exactly one subsystem, but a subsystem may have multiple scenarios used for
            validation.</Notes>
    </Relationship>
    <Relationship name="Concerns" from="Publication" to="ProteinSequence" arity="MM" converse="IsATopicOf">
      <DisplayInfo theme="web"/>
      <Notes>This relationship connects a publication to the protein sequences it
            describes.</Notes>
    </Relationship>
    <Relationship name="Identifies" from="EC" to="Role" arity="1M" converse="IsIdentifiedBy">
      <DisplayInfo theme="web"/>
      <Notes>This relationship connects an EC number code to its relevant roles. A role will
            only have one EC number, but an EC number can identify multiple roles.</Notes>
    </Relationship>
  </Relationships>
  <Shapes>
  </Shapes>
</Database>

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3