[Bio] / Sprout / SproutDBD.xml Repository:
ViewVC logotype

Diff of /Sprout/SproutDBD.xml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.23, Wed Apr 19 03:36:29 2006 UTC revision 1.51, Tue Feb 5 05:46:03 2008 UTC
# Line 1  Line 1 
1  <?xml version="1.0" encoding="utf-8" ?>  <?xml version="1.0" encoding="utf-8" ?>
2  <Database>  <Database>
3      <Title>Sprout Genome and Subsystem Database</Title>      <Title>Sprout Genome and Subsystem Database</Title>
4        <Notes>The Sprout database contains the genetic data for all complete organisms in the SEED.
5        The data that is not in Sprout-- attributes, similarities, couplings-- is stored on external
6        servers available to the Sprout software. The Sprout database is reloaded approximately once
7        per month. There is significant redundancy in the Sprout database because it has been
8        optimized for searching. In particular, the Feature table contains an extra copy of the
9        feature's functional role and a list of possible search terms.</Notes>
10      <Entities>      <Entities>
11          <Entity name="Genome" keyType="name-string">          <Entity name="Genome" keyType="name-string">
12              <Notes>A [i]genome[/i] contains the sequence data for a particular individual organism.</Notes>              <Notes>A [i]genome[/i] contains the sequence data for a particular individual organism.</Notes>
13              <Fields>              <Fields>
14                  <Field name="genus" type="name-string">                  <Field name="genus" type="name-string">
15                      <Notes>Genus of the relevant organism.</Notes>                      <Notes>Genus of the relevant organism.</Notes>
                     <DataGen pass="1">RandParam('streptococcus', 'staphyloccocus', 'felis', 'homo', 'ficticio', 'strangera', 'escherischia', 'carborunda')</DataGen>  
16                  </Field>                  </Field>
17                  <Field name="species" type="name-string">                  <Field name="species" type="name-string">
18                      <Notes>Species of the relevant organism.</Notes>                      <Notes>Species of the relevant organism.</Notes>
                     <DataGen pass="1">StringGen('PKVKVKVKVKV')</DataGen>  
19                  </Field>                  </Field>
20                  <Field name="unique-characterization" type="medium-string">                  <Field name="unique-characterization" type="medium-string">
21                      <Notes>The unique characterization identifies the particular organism instance from which the                      <Notes>The unique characterization identifies the particular organism instance from which the
22                      genome is taken. It is possible to have in the database more than one genome for a                      genome is taken. It is possible to have in the database more than one genome for a
23                      particular species, and every individual organism has variations in its DNA.</Notes>                      particular species, and every individual organism has variations in its DNA.</Notes>
24                      <DataGen>StringGen('PKVKVK999')</DataGen>                  </Field>
25                    <Field name="version" type="name-string">
26                        <Notes>version string for this genome, generally consisting of the genome ID followed
27                        by a period and a string of digits.</Notes>
28                  </Field>                  </Field>
29                  <Field name="access-code" type="key-string">                  <Field name="access-code" type="key-string">
30                      <Notes>The access code determines which users can look at the data relating to this genome.                      <Notes>The access code determines which users can look at the data relating to this genome.
31                      Each user is associated with a set of access codes. In order to view a genome, one of                      Each user is associated with a set of access codes. In order to view a genome, one of
32                      the user's access codes must match this value.</Notes>                      the user's access codes must match this value.</Notes>
                     <DataGen>RandParam('low','medium','high')</DataGen>  
33                  </Field>                  </Field>
34                  <Field name="complete" type="boolean">                  <Field name="complete" type="boolean">
35                      <Notes>TRUE if the genome is complete, else FALSE</Notes>                      <Notes>TRUE if the genome is complete, else FALSE</Notes>
36                  </Field>                  </Field>
37                    <Field name="dna-size" type="counter">
38                        <Notes>number of base pairs in the genome</Notes>
39                    </Field>
40                  <Field name="taxonomy" type="text">                  <Field name="taxonomy" type="text">
41                      <Notes>The taxonomy string contains the full taxonomy of the organism, while individual elements                      <Notes>The taxonomy string contains the full taxonomy of the organism, while individual elements
42                      separated by semi-colons (and optional white space), starting with the domain and ending with                      separated by semi-colons (and optional white space), starting with the domain and ending with
43                      the disambiguated genus and species (which is the organism's scientific name plus an                      the disambiguated genus and species (which is the organism's scientific name plus an
44                      identifying string).</Notes>                      identifying string).</Notes>
                     <DataGen pass="2">join('; ', (RandParam('bacteria', 'archaea', 'eukaryote', 'virus', 'environmental'),  
                                                   ListGen('PKVKVKVK', 5), $this->{genus}, $this->{species}))</DataGen>  
45                  </Field>                  </Field>
46                  <Field name="group-name" type="name-string" relation="GenomeGroups">                  <Field name="primary-group" type="name-string">
47                      <Notes>The group identifies a special grouping of organisms that would be displayed on a particular                      <Notes>The primary NMPDR group for this organism. There is always exactly one NMPDR group
48                      page or of particular interest to a research group or web site. A single genome can belong to multiple                      (either based on the organism name or the default value "Supporting"), whereas there can be
49                      such groups or none at all.</Notes>                      multiple named groups or even none.</Notes>
50                  </Field>                  </Field>
51              </Fields>              </Fields>
52              <Indexes>              <Indexes>
# Line 54  Line 61 
61                          <IndexField name="unique-characterization" order="ascending" />                          <IndexField name="unique-characterization" order="ascending" />
62                      </IndexFields>                      </IndexFields>
63                  </Index>                  </Index>
64                  <Index Unique="false">                  <Index>
65                        <Notes>This index allows the applications to find all genomes associated with
66                        a specific primary (NMPDR) group.</Notes>
67                        <IndexFields>
68                            <IndexField name="primary-group" order="ascending" />
69                            <IndexField name="genus" order="ascending" />
70                            <IndexField name="species" order="ascending" />
71                            <IndexField name="unique-characterization" order="ascending" />
72                        </IndexFields>
73                    </Index>
74                    <Index>
75                      <Notes>This index allows the applications to find all genomes for a particular                      <Notes>This index allows the applications to find all genomes for a particular
76                      species.</Notes>                      species.</Notes>
77                      <IndexFields>                      <IndexFields>
# Line 65  Line 82 
82                  </Index>                  </Index>
83              </Indexes>              </Indexes>
84          </Entity>          </Entity>
85            <Entity name="CDD" keyType="key-string">
86                <Notes>A CDD is a protein domain designator. It represents the shape of a molecular unit
87                on a feature's protein. The ID is six-digit string assigned by the public Conserved Domain
88                Database. A CDD can occur on multiple features and a feature generally has multiple CDDs.</Notes>
89            </Entity>
90          <Entity name="Source" keyType="medium-string">          <Entity name="Source" keyType="medium-string">
91              <Notes>A [i]source[/i] describes a place from which genome data was taken. This can be an organization              <Notes>A [i]source[/i] describes a place from which genome data was taken. This can be an organization
92              or a paper citation.</Notes>              or a paper citation.</Notes>
93              <Fields>              <Fields>
94                  <Field name="URL" type="string" relation="SourceURL">                  <Field name="URL" type="string" relation="SourceURL">
95                      <Notes>URL the paper cited or of the organization's web site. This field optional.</Notes>                      <Notes>URL the paper cited or of the organization's web site. This field optional.</Notes>
                     <DataGen>"http://www.conservativecat.com/Ferdy/TestTarget.php?Source=" . $this->{id}</DataGen>  
96                  </Field>                  </Field>
97                  <Field name="description" type="text">                  <Field name="description" type="text">
98                      <Notes>Description the source. The description can be a street address or a citation.</Notes>                      <Notes>Description the source. The description can be a street address or a citation.</Notes>
                     <DataGen>$this->{id} . ': ' . StringGen(IntGen(50,200))</DataGen>  
99                  </Field>                  </Field>
100              </Fields>              </Fields>
101          </Entity>          </Entity>
# Line 98  Line 118 
118                  <Field name="sequence" type="text">                  <Field name="sequence" type="text">
119                      <Notes>String consisting of the residues. Each residue is described by a single                      <Notes>String consisting of the residues. Each residue is described by a single
120                      character in the string.</Notes>                      character in the string.</Notes>
                     <DataGen>RandChars("ACGT", IntGen(100,400))</DataGen>  
121                  </Field>                  </Field>
122                  <Field name="quality-vector" type="text">                  <Field name="quality-vector" type="text">
123                      <Notes>String describing the quality data for each base pair. Individual values will                      <Notes>String describing the quality data for each base pair. Individual values will
# Line 107  Line 126 
126                      10^-30. A higher quality number a better chance of a correct match. It is possible                      10^-30. A higher quality number a better chance of a correct match. It is possible
127                      that the quality data is not known for a sequence. If that is the case, the quality                      that the quality data is not known for a sequence. If that is the case, the quality
128                      vector will contain the [b]unknown[/b].</Notes>                      vector will contain the [b]unknown[/b].</Notes>
                     <DataGen>unknown</DataGen>  
129                  </Field>                  </Field>
130              </Fields>              </Fields>
131          </Entity>          </Entity>
132          <Entity name="Feature" keyType="name-string">          <Entity name="Feature" keyType="id-string">
133              <Notes>A [i]feature[/i] is a part of a genome that is of special interest. Features              <Notes>A [i]feature[/i] is a part of a genome that is of special interest. Features
134              may be spread across multiple contigs of a genome, but never across more than              may be spread across multiple contigs of a genome, but never across more than
135              one genome. Features can be assigned to roles via spreadsheet cells,              one genome. Features can be assigned to roles via spreadsheet cells,
136              and are the targets of annotation.</Notes>              and are the targets of annotation.</Notes>
137              <Fields>              <Fields>
138                  <Field name="feature-type" type="string">                  <Field name="feature-type" type="id-string">
139                      <Notes>Code indicating the type of this feature.</Notes>                      <Notes>Code indicating the type of this feature.</Notes>
                     <DataGen>RandParam('peg','rna')</DataGen>  
                 </Field>  
                 <Field name="alias" type="medium-string" relation="FeatureAlias">  
                     <Notes>Alternative name for this feature. A feature can have many aliases.</Notes>  
                     <DataGen testCount="3">StringGen('Pgi|99999', 'Puni|XXXXXX', 'PAAAAAA999')</DataGen>  
140                  </Field>                  </Field>
141                  <Field name="translation" type="text" relation="FeatureTranslation">                  <Field name="translation" type="text" relation="FeatureTranslation">
142                      <Notes>[i](optional)[/i] A translation of this feature's residues into character                      <Notes>[i](optional)[/i] A translation of this feature's residues into character
143                      codes, formed by concatenating the pieces of the feature together. For a                      codes, formed by concatenating the pieces of the feature together. For a
144                      protein encoding group, this is the protein characters. For other types                      protein encoding group, this is the protein characters. For other types
145                      it is the DNA characters.</Notes>                      it is the DNA characters.</Notes>
                     <DataGen testCount="0"></DataGen>  
146                  </Field>                  </Field>
147                  <Field name="upstream-sequence" type="text" relation="FeatureUpstream">                  <Field name="upstream-sequence" type="text" relation="FeatureUpstream">
148                      <Notes>Upstream sequence the feature. This includes residues preceding the feature as well as some of                      <Notes>Upstream sequence the feature. This includes residues preceding the feature as well as some of
149                      the feature's initial residues.</Notes>                      the feature's initial residues.</Notes>
150                      <DataGen testCount="0"></DataGen>                  </Field>
151                    <Field name="assignment" type="text">
152                        <Notes>Default functional assignment for this feature.</Notes>
153                  </Field>                  </Field>
154                  <Field name="active" type="boolean">                  <Field name="active" type="boolean">
155                      <Notes>TRUE if this feature is still considered valid, FALSE if it has been logically deleted.</Notes>                      <Notes>TRUE if this feature is still considered valid, FALSE if it has been logically deleted.</Notes>
156                      <DataGen>1</DataGen>                  </Field>
157                    <Field name="assignment-maker" type="name-string">
158                        <Notes>name of the user who made the functional assignment</Notes>
159                    </Field>
160                    <Field name="assignment-quality" type="char">
161                        <Notes>quality of the functional assignment, usually a space, but may be W (indicating weak) or X
162                        (indicating experimental)</Notes>
163                    </Field>
164                    <Field name="keywords" type="text" searchable="1">
165                        <Notes>This is a list of search keywords for the feature. It includes the
166                        functional assignment, subsystem roles, and special properties.</Notes>
167                  </Field>                  </Field>
168                  <Field name="link" type="text" relation="FeatureLink">                  <Field name="link" type="text" relation="FeatureLink">
169                      <Notes>Web hyperlink for this feature. A feature have no hyperlinks or it can have many. The                      <Notes>Web hyperlink for this feature. A feature have no hyperlinks or it can have many. The
170                      links are to other websites that have useful about the gene that the feature represents, and                      links are to other websites that have useful about the gene that the feature represents, and
171                      are coded as raw HTML, using [b]&lt;a href="[i]link[/i]"&gt;[i]text[/i]&lt;/a&gt;[/b] notation.</Notes>                      are coded as raw HTML, using [b]&lt;a href="[i]link[/i]"&gt;[i]text[/i]&lt;/a&gt;[/b] notation.</Notes>
172                      <DataGen testCount="3">'http://www.conservativecat.com/Ferdy/TestTarget.php?Source=' . $this->{id} .                  </Field>
173                      "&amp;Number=" . IntGen(1,99)</DataGen>                  <Field name="conservation" type="float" relation="FeatureConservation">
174                        <Notes>A number between 0 and 1 that indicates the degree to which this feature's DNA is
175                        conserved in related genomes. A value of 1 indicates perfect conservation. A value less
176                        than 1 is a reflection of the degree to which gap characters interfere in the alignment
177                        between the feature and its close relatives.</Notes>
178                    </Field>
179                    <Field name="essential" type="text" relation="FeatureEssential" special="property_search">
180                        <Notes>A value indicating the essentiality of the feature, coded as HTML. In most
181                        cases, this will be a word describing whether the essentiality is confirmed (essential)
182                        or potential (potential-essential), hyperlinked to the document from which the
183                        essentiality was curated. If a feature is not essential, this field will have no
184                        values; otherwise, it may have multiple values.</Notes>
185                    </Field>
186                    <Field name="virulent" type="text" relation="FeatureVirulent" special="property_search">
187                        <Notes>A value indicating the virulence of the feature, coded as HTML. In most
188                        cases, this will be a phrase or SA number hyperlinked to the document from which
189                        the virulence information was curated. If the feature is not virulent, this field
190                        will have no values; otherwise, it may have multiple values.</Notes>
191                    </Field>
192                    <Field name="cello" type="name-string">
193                        <Notes>The cello value specifies the expected location of the protein: cytoplasm,
194                        cell wall, inner membrane, and so forth.</Notes>
195                    </Field>
196                    <Field name="iedb" type="text" relation="FeatureIEDB" special="property_search">
197                        <Notes>A value indicating whether or not the feature can be found in the
198                        Immune Epitope Database. If the feature has not been matched to that database,
199                        this field will have no values. Otherwise, it will have an epitope name and/or
200                        sequence, hyperlinked to the database.</Notes>
201                    </Field>
202                    <Field name="location-string" type="text">
203                        <Notes>Location of the feature, expressed as a comma-delimited list of Sprout location
204                        strings. This gives us a fast mechanism for extracting the feature location. Otherwise,
205                        we have to painstakingly paste together the IsLocatedIn records, which are themselves
206                        designed to help look for genes in a particular region rather than to find the location
207                        of a gene.</Notes>
208                  </Field>                  </Field>
209              </Fields>              </Fields>
210              <Indexes>              <Indexes>
211                  <Index>                  <Index>
212                      <Notes>This index allows the user to find the feature corresponding to                      <Notes>This index allows us to locate a feature by its CELLO value.</Notes>
                     the specified alias name.</Notes>  
213                      <IndexFields>                      <IndexFields>
214                          <IndexField name="alias" order="ascending" />                          <IndexField name="cello" order="ascending" />
215                      </IndexFields>                      </IndexFields>
216                  </Index>                  </Index>
217              </Indexes>              </Indexes>
218          </Entity>          </Entity>
219            <Entity name="FeatureAlias" keyType="medium-string">
220                <Notes>Alternative names for features. A feature can have many aliases. In general,
221                each alias corresponds to only one feature, but there are exceptionsis is not strictly enforced.</Notes>
222            </Entity>
223            <Entity name="SynonymGroup" keyType="id-string">
224                <Notes>A [i]synonym group[/i] represents a group of features. Substantially identical features
225                are mapped to the same synonym group, and this information is used to expand similarities.</Notes>
226            </Entity>
227          <Entity name="Role" keyType="string">          <Entity name="Role" keyType="string">
228              <Notes>A [i]role[/i] describes a biological function that may be fulfilled by a feature.              <Notes>A [i]role[/i] describes a biological function that may be fulfilled by a feature.
229              One of the main goals of the database is to record the roles of the various features.</Notes>              One of the main goals of the database is to record the roles of the various features.</Notes>
230              <Fields>          </Entity>
231                  <Field name="EC" type="string" relation="RoleEC">          <Entity name="RoleEC" keyType="string">
232                      <Notes>EC code for this role.</Notes>              <Notes>EC code for a role.</Notes>
                     <DataGen testCount="1">StringGen(IntGen(20,40)) . "(" . $this->{id} . ")"</DataGen>  
                 </Field>  
                 <Field name="abbr" type="name-string">  
                     <Notes>Abbreviated name for the role, generally non-unique, but useful  
                     in column headings for HTML tables.</Notes>  
                 </Field>  
             </Fields>  
             <Indexes>  
                 <Index>  
                     <Notes>This index allows the user to find the role corresponding to  
                     an EC number.</Notes>  
                     <IndexFields>  
                         <IndexField name="EC" order="ascending" />  
                     </IndexFields>  
                 </Index>  
             </Indexes>  
233          </Entity>          </Entity>
234          <Entity name="Annotation" keyType="name-string">          <Entity name="Annotation" keyType="name-string">
235              <Notes>An [i]annotation[/i] contains supplementary information about a feature. Annotations              <Notes>An [i]annotation[/i] contains supplementary information about a feature. Annotations
236              are currently the only objects that may be inserted directly into the database. All other              are currently the only objects that may be inserted directly into the database. All other
237              information is loaded from data exported by the SEED.              information is loaded from data exported by the SEED.</Notes>
             [p]Each annotation is associated with a target [b]Feature[/b]. The key of the annotation  
             is the target feature ID followed by a timestamp.</Notes>  
238              <Fields>              <Fields>
239                  <Field name="time" type="date">                  <Field name="time" type="date">
240                      <Notes>Date and time of the annotation.</Notes>                      <Notes>Date and time of the annotation.</Notes>
# Line 196  Line 243 
243                      <Notes>Text of the annotation.</Notes>                      <Notes>Text of the annotation.</Notes>
244                  </Field>                  </Field>
245              </Fields>              </Fields>
246                <Indexes>
247                    <Index>
248                        <Notes>This index allows the user to find recent annotations.</Notes>
249                        <IndexFields>
250                            <IndexField name="time" order="descending" />
251                        </IndexFields>
252                    </Index>
253                </Indexes>
254          </Entity>          </Entity>
255          <Entity name="Reaction" keyType="key-string">          <Entity name="Reaction" keyType="key-string">
256              <Notes>A [i]reaction[/i] is a chemical process catalyzed by a protein. The reaction ID              <Notes>A [i]reaction[/i] is a chemical process catalyzed by a protein. The reaction ID
# Line 214  Line 269 
269              <Notes>A [i]compound[/i] is a chemical that participates in a reaction.              <Notes>A [i]compound[/i] is a chemical that participates in a reaction.
270              All compounds have a unique ID and may also have one or more names.</Notes>              All compounds have a unique ID and may also have one or more names.</Notes>
271              <Fields>              <Fields>
272                  <Field name="name-priority" type="int" relation="CompoundName">                  <Field name="label" type="string">
                     <Notes>Priority of a compound name. The name with the loweset  
                     priority is the main name of this compound.</Notes>  
                 </Field>  
                 <Field name="name" type="name-string" relation="CompoundName">  
                     <Notes>Descriptive name for the compound. A compound may  
                     have several names.</Notes>  
                 </Field>  
                 <Field name="cas-id" type="name-string" relation="CompoundCAS">  
                     <Notes>Chemical Abstract Service ID for this compound (optional).</Notes>  
                 </Field>  
                 <Field name="label" type="name-string">  
273                      <Notes>Name used in reaction display strings.                      <Notes>Name used in reaction display strings.
274                      It is the same as the name possessing a priority of 1, but it is placed                      It is the same as the name possessing a priority of 1, but it is placed
275                      here to speed up the query used to create the display strings.</Notes>                      here to speed up the query used to create the display strings.</Notes>
276                  </Field>                  </Field>
277              </Fields>              </Fields>
278              <Indexes>          </Entity>
279                  <Index>          <Entity name="CompoundName" keyType="string">
280                      <Notes>This index allows the user to find the compound corresponding to              <Notes>A [i]compound name[/i] is a common name for the chemical represented by a
281                      the specified name.</Notes>              compound.</Notes>
282                      <IndexFields>          </Entity>
283                          <IndexField name="name" order="ascending" />          <Entity name="CompoundCAS" keyType="name-string">
284                      </IndexFields>              <Notes>This entity represents the Chemical Abstract Service ID for a compound. Each
285                  </Index>              Compound has at most one CAS ID.</Notes>
                 <Index>  
                     <Notes>This index allows the user to find the compound corresponding to  
                     the specified CAS ID.</Notes>  
                     <IndexFields>  
                         <IndexField name="cas-id" order="ascending" />  
                     </IndexFields>  
                 </Index>  
                 <Index>  
                     <Notes>This index allows the user to access the compound names in  
                     priority order.</Notes>  
                     <IndexFields>  
                         <IndexField name="id" order="ascending" />  
                         <IndexField name="name-priority" order="ascending" />  
                     </IndexFields>  
                 </Index>  
             </Indexes>  
286          </Entity>          </Entity>
287          <Entity name="Subsystem" keyType="string">          <Entity name="Subsystem" keyType="string">
288              <Notes>A [i]subsystem[/i] is a collection of roles that work together in a cell. Identification of subsystems              <Notes>A [i]subsystem[/i] is a collection of roles that work together in a cell. Identification of subsystems
# Line 266  Line 294 
294                  <Field name="notes" type="text">                  <Field name="notes" type="text">
295                      <Notes>Descriptive notes about the subsystem.</Notes>                      <Notes>Descriptive notes about the subsystem.</Notes>
296                  </Field>                  </Field>
297                    <Field name="classification" type="string" relation="SubsystemClass">
298                        <Notes>Classification string, colon-delimited. This string organizes the
299                        subsystems into a hierarchy.</Notes>
300                    </Field>
301              </Fields>              </Fields>
302          </Entity>          </Entity>
303          <Entity name="RoleSubset" keyType="string">          <Entity name="RoleSubset" keyType="string">
# Line 279  Line 311 
311              strings. The ID of the parent subsystem is prefixed to the subset ID in order              strings. The ID of the parent subsystem is prefixed to the subset ID in order
312              to make it unique.</Notes>              to make it unique.</Notes>
313          </Entity>          </Entity>
314          <Entity name="SSCell" keyType="medium-string">          <Entity name="SSCell" keyType="hash-string">
315              <Notes>Part of the process of locating and assigning features is creating a spreadsheet of              <Notes>Part of the process of locating and assigning features is creating a spreadsheet of
316              genomes and roles to which features are assigned. A [i]spreadsheet cell[/i] represents one              genomes and roles to which features are assigned. A [i]spreadsheet cell[/i] represents one
317              of the positions on the spreadsheet.</Notes>              of the positions on the spreadsheet.</Notes>
# Line 295  Line 327 
327                      <Notes>Access code possessed by this                      <Notes>Access code possessed by this
328                      user. A user can have many access codes; a genome is accessible to the user if its                      user. A user can have many access codes; a genome is accessible to the user if its
329                      access code matches any one of the user's access codes.</Notes>                      access code matches any one of the user's access codes.</Notes>
                     <DataGen testCount="2">RandParam('low', 'medium', 'high')</DataGen>  
330                  </Field>                  </Field>
331              </Fields>              </Fields>
332          </Entity>          </Entity>
# Line 360  Line 391 
391                      </Field>                      </Field>
392                  </Fields>                  </Fields>
393          </Entity>          </Entity>
394          <Entity name="Coupling" keyType="medium-string">          <Entity name="Family" keyType="id-string">
395              <Notes>A coupling is a relationship between two features. The features are              <Notes>A family is a group of homologous PEGs believed to have the same function. Protein
396              physically close on the contig, and there is evidence that they generally              families provide a mechanism for verifying the accuracy of functional assignments
397              belong together. The key of this entity is formed by combining the coupled              and are also used in determining phylogenetic trees.</Notes>
398              feature IDs with a space.</Notes>              <Fields>
399              <Fields>                  <Field name="function" type="text">
400                  <Field name="score" type="int">                      <Notes>The functional assignment expected for all PEGs in this family.</Notes>
401                      <Notes>A number based on the set of PCHs (pairs of close homologs). A PCH                  </Field>
402                      indicates that two genes near each other on one genome are very similar to                  <Field name="size" type="int">
403                      genes near each other on another genome. The score only counts PCHs for which                      <Notes>The number of proteins in this family. This may be larger than the
404                      the genomes are very different. (In other words, we have a pairing that persists                      number of PEGs included in the family, since the family may also contain external
405                      between different organisms.) A higher score implies a stronger meaning to the                      IDs.</Notes>
406                      clustering.</Notes>                  </Field>
407                </Fields>
408            </Entity>
409            <Entity name="PDB" keyType="id-string">
410                <Notes>A PDB is a protein database containing information that can be used to determine
411                the shape of the protein and the energies required to dock with it. The ID is the
412                four-character name used on the PDB web site.</Notes>
413                <Fields>
414                    <Field name="docking-count" type="int">
415                        <Notes>The number of ligands that have been docked against this PDB.</Notes>
416                  </Field>                  </Field>
417              </Fields>              </Fields>
418                <Indexes>
419                    <Index>
420                        <IndexFields>
421                            <IndexField name="docking-count" order="descending" />
422                            <IndexField name="id" order="ascending" />
423                        </IndexFields>
424                    </Index>
425                </Indexes>
426          </Entity>          </Entity>
427          <Entity name="PCH" keyType="string">          <Entity name="Ligand" keyType="id-string">
428              <Notes>A PCH (physically close homolog) connects a clustering (which is a              <Notes>A Ligand is a chemical of interest in computing docking energies against a PDB.
429              pair of physically close features on a contig) to a second pair of physically              The ID of the ligand is an 8-digit ZINC ID number.</Notes>
430              close features that are similar to the first. Essentially, the PCH is a              <Fields>
431              relationship between two clusterings in which the first clustering's features                  <Field name="name" type="long-string">
432              are similar to the second clustering's features. The simplest model for                      <Notes>Chemical name of this ligand.</Notes>
             this would be to simply relate clusterings to each other; however, not all  
             physically close pairs qualify as clusterings, so we relate a clustering to  
             a pair of features. The key is the clustering key followed by the IDs  
             of the features in the second pair.</Notes>  
             <Fields>  
                 <Field name="used" type="boolean">  
                     <Notes>TRUE if this PCH is used in scoring the attached clustering,  
                     else FALSE. If a clustering has a PCH for a particular genome and many  
                     similar genomes are present, then a PCH will probably exist for the  
                     similar genomes as well. When this happens, only one of the PCHs will  
                     be scored: the others are considered duplicates of the same evidence.</Notes>  
433                  </Field>                  </Field>
434              </Fields>              </Fields>
435          </Entity>          </Entity>
436      </Entities>      </Entities>
437      <Relationships>      <Relationships>
438          <Relationship name="ParticipatesInCoupling" from="Feature" to="Coupling" arity="MM">          <Relationship name="IsPresentOnProteinOf" from="CDD" to="Feature" arity="MM">
439              <Notes>This relationship connects a feature to all the functional couplings              <Notes>This relationship connects a feature to its CDD protein domains. The
440              in which it participates. A functional coupling is a recognition of the fact              match score is included as intersection data.</Notes>
441              that the features are close to each other on a chromosome, and similar              <Fields>
442              features in other genomes also tend to be close.</Notes>                  <Field name="score" type="float">
443              <Fields>                      <Notes>This is the match score between the feature and the CDD. A
444                  <Field name="pos" type="int">                      lower score is a better match.</Notes>
445                      <Notes>Ordinal position of the feature in the coupling. Currently,                  </Field>
446                      this is either "1" or "2".</Notes>              </Fields>
447                <FromIndex>
448                    <IndexFields>
449                        <IndexField name="score" order="ascending" />
450                    </IndexFields>
451                </FromIndex>
452            </Relationship>
453            <Relationship name="IsIdentifiedByCAS" from="Compound" to="CompoundCAS" arity="MM">
454                <Notes>Relates a compound's CAS ID to the compound itself. Every CAS ID is
455                associated with a compound, and some are associated with two compounds, but not
456                all compounds have CAS IDs.</Notes>
457            </Relationship>
458            <Relationship name="IsIdentifiedByEC" from="Role" to="RoleEC" arity="MM">
459                <Notes>Relates a role to its EC number. Every EC number is associated with a
460                role, but not all roles have EC numbers.</Notes>
461            </Relationship>
462            <Relationship name="IsAliasOf" from="FeatureAlias" to="Feature" arity="MM">
463                <Notes>Connects an alias to the feature it represents. Every alias connects
464                to at least 1 feature, and a feature connects to many aliases.</Notes>
465            </Relationship>
466            <Relationship name="HasCompoundName" from="Compound" to="CompoundName" arity="MM">
467                <Notes>Connects a compound to its names. A compound generally has several
468                names</Notes>
469                <Fields>
470                    <Field name="priority" type="int">
471                        <Notes>Priority of this name, with 1 being the highest priority, 2
472                        the next highest, and so forth.</Notes>
473                    </Field>
474                </Fields>
475                <FromIndex>
476                    <Notes>This index enables the application to view the names of a compound
477                    in priority order.</Notes>
478                    <IndexFields>
479                        <IndexField name="priority" order="ascending" />
480                    </IndexFields>
481                </FromIndex>
482            </Relationship>
483            <Relationship name="IsProteinForFeature" from="PDB" to="Feature" arity="MM">
484                <Notes>Relates a PDB to features that produce highly similar proteins.</Notes>
485                <Fields>
486                    <Field name="score" type="float">
487                        <Notes>Similarity score for the comparison between the feature and
488                        the PDB protein. A lower score indicates a better match.</Notes>
489                    </Field>
490                    <Field name="start-location" type="int">
491                        <Notes>Starting location within the feature of the matching region.</Notes>
492                    </Field>
493                    <Field name="end-location" type="int">
494                        <Notes>Ending location within the feature of the matching region.</Notes>
495                  </Field>                  </Field>
496              </Fields>              </Fields>
497              <ToIndex>              <ToIndex>
498                    <Notes>This index enables the application to view the PDBs of a
499                    feature in order from the closest match to the furthest.</Notes>
500                    <IndexFields>
501                        <IndexField name="score" order="ascending" />
502                    </IndexFields>
503                </ToIndex>
504                <FromIndex>
505                  <Notes>This index enables the application to view the features of                  <Notes>This index enables the application to view the features of
506                  a coupling in the proper order. The order influences the way the                  a PDB in order from the closest match to the furthest.</Notes>
                 PCHs are examined.</Notes>  
507                  <IndexFields>                  <IndexFields>
508                      <IndexField name="pos" order="ascending" />                      <IndexField name="score" order="ascending" />
509                  </IndexFields>                  </IndexFields>
510                </FromIndex>
511            </Relationship>
512            <Relationship name="DocksWith" from="PDB" to="Ligand" arity="MM">
513                <Notes>Indicates that a docking result exists between a PDB and a ligand. The
514                docking result describes the energy required for the ligand to dock with
515                the protein described by the PDB. A lower energy indicates the ligand has a
516                good chance of disabling the protein. At the current time, only the best
517                docking results are kept.</Notes>
518                <Fields>
519                    <Field name="reason" type="id-string">
520                        <Notes>Indication of the reason for determining the docking result.
521                        A value of [b]Random[/b] indicates the docking was attempted as a part
522                        of a random survey used to determine the docking characteristics of the
523                        PDB. A value of [b]Rich[/b] indicates the docking was attempted because
524                        a low-energy docking result was predicted for the ligand with respect
525                        to the PDB.</Notes>
526                    </Field>
527                    <Field name="tool" type="id-string">
528                        <Notes>Name of the tool used to produce the docking result.</Notes>
529                    </Field>
530                    <Field name="total-energy" type="float">
531                        <Notes>Total energy required for the ligand to dock with the PDB
532                        protein, in kcal/mol. A negative value means energy is released.</Notes>
533                    </Field>
534                    <Field name="vanderwalls-energy" type="float">
535                        <Notes>Docking energy in kcal/mol that results from the geometric fit
536                        (Van der Waals force) between the PDB and the ligand.</Notes>
537                    </Field>
538                    <Field name="electrostatic-energy" type="float">
539                        <Notes>Docking energy in kcal/mol that results from the movement of
540                        electrons (electrostatic force) between the PDB and the ligan.</Notes>
541                    </Field>
542                </Fields>
543                <FromIndex>
544                    <Notes>This index enables the application to view a PDB's docking results from
545                    the lowest energy (best docking) to highest energy (worst docking).</Notes>
546                    <IndexFields>
547                        <IndexField name="total-energy" order="ascending" />
548                    </IndexFields>
549                </FromIndex>
550                <ToIndex>
551                    <Notes>This index enables the application to view a ligand's docking results from
552                    the lowest energy (best docking) to highest energy (worst docking). Note that
553                    since we only keep the best docking results for a PDB, this index is not likely
554                    to provide useful results.</Notes>
555              </ToIndex>              </ToIndex>
556          </Relationship>          </Relationship>
557          <Relationship name="IsEvidencedBy" from="Coupling" to="PCH" arity="1M">          <Relationship name="IsFamilyForFeature" from="Family" to="Feature" arity="MM">
558              <Notes>This relationship connects a functional coupling to the physically              <Notes>This relationship connects a protein family to all of its PEGs and connects
559              close homologs (PCHs) which affirm that the coupling is meaningful.</Notes>              each PEG to all of its protein families.</Notes>
560          </Relationship>          </Relationship>
561          <Relationship name="UsesAsEvidence" from="PCH" to="Feature" arity="MM">          <Relationship name="IsSynonymGroupFor" from="SynonymGroup" to="Feature" arity="MM">
562              <Notes>This relationship connects a PCH to the features that represent its              <Notes>This relation connects a synonym group to the features that make it
563              evidence. Each PCH is connected to a parent coupling that relates two features              up.</Notes>
564              on a specific genome. The PCH's evidence that the parent coupling is functional          </Relationship>
565              is the existence of two physically close features on a different genome that          <Relationship name="HasFeature" from="Genome" to="Feature" arity="1M">
566              correspond to the features in the coupling. Those features are found on the              <Notes>This relationship connects a genome to all of its features. This
567              far side of this relationship.</Notes>              relationship is redundant in a sense, because the genome ID is part
568              <Fields>              of the feature ID; however, it makes the creation of certain queries more
569                  <Field name="pos" type="int">              convenient because you can drag in filtering information for a feature's
570                      <Notes>Ordinal position of the feature in the coupling that corresponds              genome.</Notes>
571                      to our target feature. There is a one-to-one correspondence between the              <Fields>
572                      features connected to the PCH by this relationship and the features                  <Field name="type" type="key-string">
573                      connected to the PCH's parent coupling. The ordinal position is used                      <Notes>Feature type (eg. peg, rna)</Notes>
                     to decode that relationship. Currently, this field is either "1" or  
                     "2".</Notes>  
574                  </Field>                  </Field>
575              </Fields>              </Fields>
576              <FromIndex>              <FromIndex>
577                  <Notes>This index enables the application to view the features of                  <Notes>This index enables the application to view the features of a
578                  a PCH in the proper order.</Notes>                  Genome sorted by type.</Notes>
579                  <IndexFields>                  <IndexFields>
580                      <IndexField name="pos" order="ascending" />                      <IndexField name="type" order="ascending" />
581                  </IndexFields>                  </IndexFields>
582              </FromIndex>              </FromIndex>
583          </Relationship>          </Relationship>
# Line 509  Line 643 
643          <Relationship name="OccursInSubsystem" from="Role" to="Subsystem" arity="MM">          <Relationship name="OccursInSubsystem" from="Role" to="Subsystem" arity="MM">
644              <Notes>This relationship connects roles to the subsystems that implement them. </Notes>              <Notes>This relationship connects roles to the subsystems that implement them. </Notes>
645              <Fields>              <Fields>
646                    <Field name="abbr" type="name-string">
647                        <Notes>Abbreviated name for the role, generally non-unique, but useful
648                        in column headings for HTML tables.</Notes>
649                    </Field>
650                  <Field name="column-number" type="int">                  <Field name="column-number" type="int">
651                      <Notes>Column number for this role in the specified subsystem's                      <Notes>Column number for this role in the specified subsystem's
652                      spreadsheet.</Notes>                      spreadsheet.</Notes>
# Line 628  Line 766 
766                      [b]-[/b] if it is backward.</Notes>                      [b]-[/b] if it is backward.</Notes>
767                  </Field>                  </Field>
768              </Fields>              </Fields>
769              <FromIndex Unique="false">              <FromIndex>
770                  <Notes>This index allows the application to find all the segments of a feature in                  <Notes>This index allows the application to find all the segments of a feature in
771                  the proper order.</Notes>                  the proper order.</Notes>
772                  <IndexFields>                  <IndexFields>
# Line 643  Line 781 
781                  </IndexFields>                  </IndexFields>
782              </ToIndex>              </ToIndex>
783          </Relationship>          </Relationship>
         <Relationship name="IsBidirectionalBestHitOf" from="Feature" to="Feature" arity="MM">  
             <Notes>This relationship is one of two that relate features to each other. It  
             connects features that are very similar but on separate genomes. A  
             bidirectional best hit relationship exists between two features [b]A[/b]  
             and [b]B[/b] if [b]A[/b] is the best match for [b]B[/b] on [b]A[/b]'s genome  
             and [b]B[/b] is the best match for [b]A[/b] on [b]B[/b]'s genome. </Notes>  
             <Fields>  
                 <Field name="genome" type="name-string">  
                     <Notes>ID of the genome containing the target (to) feature.</Notes>  
                 </Field>  
                 <Field name="sc" type="float">  
                     <Notes>score for this relationship</Notes>  
                 </Field>  
             </Fields>  
             <FromIndex>  
                 <Notes>This index allows the application to find a feature's best hit for  
                 a specific target genome.</Notes>  
                 <IndexFields>  
                     <IndexField name="genome" order="ascending" />  
                 </IndexFields>  
             </FromIndex>  
         </Relationship>  
784          <Relationship name="HasProperty" from="Feature" to="Property" arity="MM">          <Relationship name="HasProperty" from="Feature" to="Property" arity="MM">
785              <Notes>This relationship connects a feature to its known property values.              <Notes>This relationship connects a feature to its known property values.
786              The relationship contains text data that indicates the paper or organization              The relationship contains text data that indicates the paper or organization
# Line 730  Line 846 
846              chemical reactions. A single reaction can be triggered by many roles,              chemical reactions. A single reaction can be triggered by many roles,
847              and a role can trigger many reactions.</Notes>              and a role can trigger many reactions.</Notes>
848          </Relationship>          </Relationship>
849            <Relationship name="HasRoleInSubsystem" from="Feature" to="Subsystem" arity="MM">
850                <Notes>This relationship connects a feature to the subsystems in which it
851                participates. This is technically redundant information, but it is used
852                so often that it deserves its own table.</Notes>
853                <Fields>
854                    <Field name="genome" type="name-string">
855                        <Notes>ID of the genome containing the feature</Notes>
856                    </Field>
857                    <Field name="type" type="key-string">
858                        <Notes>Feature type (eg. peg, rna)</Notes>
859                    </Field>
860                </Fields>
861                <ToIndex>
862                    <Notes>This index enables the application to view the features of a
863                    subsystem sorted by genome and feature type.</Notes>
864                    <IndexFields>
865                        <IndexField name="genome" order="ascending" />
866                        <IndexField name="type" order="ascending" />
867                    </IndexFields>
868                </ToIndex>
869            </Relationship>
870      </Relationships>      </Relationships>
871  </Database>  </Database>

Legend:
Removed from v.1.23  
changed lines
  Added in v.1.51

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3