[Bio] / Sprout / SproutDBD.xml Repository:
ViewVC logotype

Annotation of /Sprout/SproutDBD.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.53 - (view) (download) (as text)

1 : parrello 1.1 <?xml version="1.0" encoding="utf-8" ?>
2 :     <Database>
3 :     <Title>Sprout Genome and Subsystem Database</Title>
4 : parrello 1.52 <Notes>The Sprout database contains the genetic data for all complete organisms in the [[SeedEnvironment]].
5 : parrello 1.51 The data that is not in Sprout-- attributes, similarities, couplings-- is stored on external
6 :     servers available to the Sprout software. The Sprout database is reloaded approximately once
7 :     per month. There is significant redundancy in the Sprout database because it has been
8 :     optimized for searching. In particular, the Feature table contains an extra copy of the
9 :     feature's functional role and a list of possible search terms.</Notes>
10 : parrello 1.1 <Entities>
11 :     <Entity name="Genome" keyType="name-string">
12 : parrello 1.52 <Notes>A [[Genome]] contains the sequence data for a particular individual organism.</Notes>
13 : parrello 1.1 <Fields>
14 :     <Field name="genus" type="name-string">
15 :     <Notes>Genus of the relevant organism.</Notes>
16 :     </Field>
17 :     <Field name="species" type="name-string">
18 : parrello 1.8 <Notes>Species of the relevant organism.</Notes>
19 :     </Field>
20 : parrello 1.1 <Field name="unique-characterization" type="medium-string">
21 : parrello 1.8 <Notes>The unique characterization identifies the particular organism instance from which the
22 :     genome is taken. It is possible to have in the database more than one genome for a
23 : parrello 1.1 particular species, and every individual organism has variations in its DNA.</Notes>
24 :     </Field>
25 : parrello 1.48 <Field name="version" type="name-string">
26 :     <Notes>version string for this genome, generally consisting of the genome ID followed
27 :     by a period and a string of digits.</Notes>
28 :     </Field>
29 : parrello 1.1 <Field name="access-code" type="key-string">
30 : parrello 1.52 <Notes>The access code field is deprecated. Its function has been replaced by
31 :     the account management system developed for the [[RapidAnnotationServer]].</Notes>
32 : parrello 1.8 </Field>
33 : parrello 1.15 <Field name="complete" type="boolean">
34 :     <Notes>TRUE if the genome is complete, else FALSE</Notes>
35 :     </Field>
36 : parrello 1.48 <Field name="dna-size" type="counter">
37 :     <Notes>number of base pairs in the genome</Notes>
38 :     </Field>
39 : parrello 1.8 <Field name="taxonomy" type="text">
40 : parrello 1.52 <Notes>The taxonomy string contains the full [[Wikipedia:taxonomy]] of the organism, while individual elements
41 : parrello 1.8 separated by semi-colons (and optional white space), starting with the domain and ending with
42 :     the disambiguated genus and species (which is the organism's scientific name plus an
43 :     identifying string).</Notes>
44 :     </Field>
45 : parrello 1.37 <Field name="primary-group" type="name-string">
46 :     <Notes>The primary NMPDR group for this organism. There is always exactly one NMPDR group
47 : parrello 1.52 per organism (either based on the organism name or the default value =Supporting=). In general,
48 :     more data is kept on organisms in NMPDR groups than on supporting organisms.</Notes>
49 :     </Field>
50 :     <Field name="contigs" type="int">
51 :     <Notes>Number of contigs for this organism.</Notes>
52 :     </Field>
53 :     <Field name="pegs" type="int">
54 :     <Notes>Number of [[protein encoding genes]] for this organism</Notes>
55 :     </Field>
56 :     <Field name="rnas" type="int">
57 :     <Notes>Number of RNA features found for this organism.</Notes>
58 : parrello 1.37 </Field>
59 : parrello 1.1 </Fields>
60 :     <Indexes>
61 : parrello 1.45 <Index>
62 : parrello 1.1 <Notes>This index allows the applications to find all genomes associated with
63 :     a specific access code, so that a complete list of the genomes users can view
64 :     may be generated.</Notes>
65 :     <IndexFields>
66 :     <IndexField name="access-code" order="ascending" />
67 :     <IndexField name="genus" order="ascending" />
68 :     <IndexField name="species" order="ascending" />
69 :     <IndexField name="unique-characterization" order="ascending" />
70 :     </IndexFields>
71 :     </Index>
72 : parrello 1.45 <Index>
73 : parrello 1.37 <Notes>This index allows the applications to find all genomes associated with
74 :     a specific primary (NMPDR) group.</Notes>
75 :     <IndexFields>
76 :     <IndexField name="primary-group" order="ascending" />
77 :     <IndexField name="genus" order="ascending" />
78 :     <IndexField name="species" order="ascending" />
79 :     <IndexField name="unique-characterization" order="ascending" />
80 :     </IndexFields>
81 :     </Index>
82 : parrello 1.45 <Index>
83 : parrello 1.1 <Notes>This index allows the applications to find all genomes for a particular
84 :     species.</Notes>
85 :     <IndexFields>
86 :     <IndexField name="genus" order="ascending" />
87 :     <IndexField name="species" order="ascending" />
88 :     <IndexField name="unique-characterization" order="ascending" />
89 :     </IndexFields>
90 :     </Index>
91 :     </Indexes>
92 :     </Entity>
93 : parrello 1.50 <Entity name="CDD" keyType="key-string">
94 :     <Notes>A CDD is a protein domain designator. It represents the shape of a molecular unit
95 : parrello 1.52 on a feature's protein. The ID is six-digit string assigned by the public
96 :     [[http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml Conserved Domain Database]]. A CDD
97 :     can occur on multiple features and a feature generally has multiple CDDs.</Notes>
98 : parrello 1.50 </Entity>
99 : parrello 1.1 <Entity name="Source" keyType="medium-string">
100 : parrello 1.52 <Notes>A _source_ describes a place from which genome data was taken. This can be an organization
101 : parrello 1.1 or a paper citation.</Notes>
102 :     <Fields>
103 :     <Field name="URL" type="string" relation="SourceURL">
104 : parrello 1.8 <Notes>URL the paper cited or of the organization's web site. This field optional.</Notes>
105 :     </Field>
106 : parrello 1.1 <Field name="description" type="text">
107 : parrello 1.52 <Notes>Description of the source. The description can be a street address or a citation.</Notes>
108 : parrello 1.8 </Field>
109 : parrello 1.1 </Fields>
110 :     </Entity>
111 :     <Entity name="Contig" keyType="name-string">
112 : parrello 1.52 <Notes>A _contig_ is a contiguous run of residues. The contig's ID consists of the
113 : parrello 1.1 genome ID followed by a name that identifies which contig this is for the parent genome. As
114 :     is the case with all keys in this database, the individual components are separated by a
115 : parrello 1.52 period. A contig can contain over a million residues. For performance reasons, therefore,
116 :     the contig is split into multiple pieces called _sequences_. The sequences
117 : parrello 1.1 contain the characters that represent the residues as well as data on the quality of
118 :     the residue identification.</Notes>
119 :     </Entity>
120 :     <Entity name="Sequence" keyType="name-string">
121 : parrello 1.52 <Notes>A _sequence_ is a continuous piece of a contig. Contigs are split into
122 : parrello 1.1 sequences so that we don't have to have the entire contig in memory when we are
123 :     manipulating it. The key of the sequence is the contig ID followed by the index of
124 :     the begin point.</Notes>
125 :     <Fields>
126 :     <Field name="sequence" type="text">
127 : parrello 1.52 <Notes>String consisting of the residues (base pairs). Each residue is described by a single
128 : parrello 1.8 character in the string.</Notes>
129 :     </Field>
130 : parrello 1.1 <Field name="quality-vector" type="text">
131 : parrello 1.9 <Notes>String describing the quality data for each base pair. Individual values will
132 : parrello 1.8 be separated by periods. The value represents negative exponent of the probability
133 :     of error. Thus, for example, a quality of 30 indicates the probability of error is
134 : parrello 1.52 10^-30. A higher quality number indicates a better chance of a correct match. It is
135 :     possible that the quality data is not known for a sequence. If that is the case, the
136 :     quality vector will contain the string =unknown=.</Notes>
137 : parrello 1.8 </Field>
138 : parrello 1.1 </Fields>
139 :     </Entity>
140 : parrello 1.25 <Entity name="Feature" keyType="id-string">
141 : parrello 1.52 <Notes>A _feature_ (sometimes also called a [[gene]]) is a part of a genome that is of special interest. Features
142 : parrello 1.1 may be spread across multiple contigs of a genome, but never across more than
143 :     one genome. Features can be assigned to roles via spreadsheet cells,
144 : parrello 1.52 and are the targets of annotation. Each feature in the database has a unique [[FigId]].</Notes>
145 : parrello 1.1 <Fields>
146 : parrello 1.50 <Field name="feature-type" type="id-string">
147 : parrello 1.52 <Notes>Code indicating the type of this feature. Among the codes currently
148 :     supported are =peg= for a [[protein encoding gene]], =bs= for a
149 :     binding site, =opr= for an operon, and so forth.</Notes>
150 : parrello 1.8 </Field>
151 : parrello 1.1 <Field name="translation" type="text" relation="FeatureTranslation">
152 : parrello 1.52 <Notes>_(optional)_ A translation of this feature's residues into character
153 : parrello 1.8 codes, formed by concatenating the pieces of the feature together. For a
154 : parrello 1.52 [[protein encoding gene]], the translation contains protein characters. For other types
155 :     it contains DNA characters.</Notes>
156 : parrello 1.8 </Field>
157 : parrello 1.1 <Field name="upstream-sequence" type="text" relation="FeatureUpstream">
158 : parrello 1.52 <Notes>Upstream sequence for the feature. This includes residues preceding the feature as
159 :     well as some of the feature's initial residues.</Notes>
160 : parrello 1.8 </Field>
161 : parrello 1.42 <Field name="assignment" type="text">
162 :     <Notes>Default functional assignment for this feature.</Notes>
163 :     </Field>
164 : parrello 1.1 <Field name="active" type="boolean">
165 : parrello 1.52 <Notes>(This field is deprecated.) TRUE if this feature is still considered valid,
166 :     FALSE if it has been logically deleted.</Notes>
167 : parrello 1.8 </Field>
168 : parrello 1.48 <Field name="assignment-maker" type="name-string">
169 :     <Notes>name of the user who made the functional assignment</Notes>
170 :     </Field>
171 :     <Field name="assignment-quality" type="char">
172 :     <Notes>quality of the functional assignment, usually a space, but may be W (indicating weak) or X
173 :     (indicating experimental)</Notes>
174 :     </Field>
175 : parrello 1.41 <Field name="keywords" type="text" searchable="1">
176 :     <Notes>This is a list of search keywords for the feature. It includes the
177 :     functional assignment, subsystem roles, and special properties.</Notes>
178 : parrello 1.36 </Field>
179 : parrello 1.8 <Field name="link" type="text" relation="FeatureLink">
180 : parrello 1.52 <Notes>Web hyperlink for this feature. A feature can have no hyperlinks or it can have many. The
181 : parrello 1.8 links are to other websites that have useful about the gene that the feature represents, and
182 : parrello 1.52 are coded as raw HTML, using &lt;a href="_link_"&gt;_text_&lt;/a&gt; notation.</Notes>
183 : parrello 1.8 </Field>
184 : parrello 1.41 <Field name="conservation" type="float" relation="FeatureConservation">
185 : parrello 1.52 <Notes>_(optional)_ A number between 0 and 1 that indicates the degree to which this feature's DNA is
186 : parrello 1.41 conserved in related genomes. A value of 1 indicates perfect conservation. A value less
187 : parrello 1.50 than 1 is a reflection of the degree to which gap characters interfere in the alignment
188 : parrello 1.41 between the feature and its close relatives.</Notes>
189 :     </Field>
190 : parrello 1.45 <Field name="essential" type="text" relation="FeatureEssential" special="property_search">
191 :     <Notes>A value indicating the essentiality of the feature, coded as HTML. In most
192 :     cases, this will be a word describing whether the essentiality is confirmed (essential)
193 :     or potential (potential-essential), hyperlinked to the document from which the
194 :     essentiality was curated. If a feature is not essential, this field will have no
195 :     values; otherwise, it may have multiple values.</Notes>
196 :     </Field>
197 :     <Field name="virulent" type="text" relation="FeatureVirulent" special="property_search">
198 :     <Notes>A value indicating the virulence of the feature, coded as HTML. In most
199 :     cases, this will be a phrase or SA number hyperlinked to the document from which
200 :     the virulence information was curated. If the feature is not virulent, this field
201 :     will have no values; otherwise, it may have multiple values.</Notes>
202 :     </Field>
203 : parrello 1.50 <Field name="cello" type="name-string">
204 :     <Notes>The cello value specifies the expected location of the protein: cytoplasm,
205 :     cell wall, inner membrane, and so forth.</Notes>
206 :     </Field>
207 : parrello 1.45 <Field name="iedb" type="text" relation="FeatureIEDB" special="property_search">
208 :     <Notes>A value indicating whether or not the feature can be found in the
209 :     Immune Epitope Database. If the feature has not been matched to that database,
210 :     this field will have no values. Otherwise, it will have an epitope name and/or
211 :     sequence, hyperlinked to the database.</Notes>
212 :     </Field>
213 : parrello 1.50 <Field name="location-string" type="text">
214 :     <Notes>Location of the feature, expressed as a comma-delimited list of Sprout location
215 :     strings. This gives us a fast mechanism for extracting the feature location. Otherwise,
216 : parrello 1.52 we have to painstakingly paste together the [[#IsLocatedIn]] records, which are themselves
217 :     designed to help look for features in a particular region rather than to find the location
218 :     of a feature.</Notes>
219 : parrello 1.50 </Field>
220 : parrello 1.1 </Fields>
221 : parrello 1.8 <Indexes>
222 :     <Index>
223 : parrello 1.50 <Notes>This index allows us to locate a feature by its CELLO value.</Notes>
224 : parrello 1.8 <IndexFields>
225 : parrello 1.50 <IndexField name="cello" order="ascending" />
226 : parrello 1.8 </IndexFields>
227 :     </Index>
228 :     </Indexes>
229 : parrello 1.1 </Entity>
230 : parrello 1.50 <Entity name="FeatureAlias" keyType="medium-string">
231 :     <Notes>Alternative names for features. A feature can have many aliases. In general,
232 : parrello 1.52 each alias corresponds to only one feature, but there are many exceptions to this rule.</Notes>
233 :     </Entity>
234 :     <Entity name="SproutUser" keyType="name-string">
235 :     <Notes>A _user_ is a person who can make annotations and view data in the database. The
236 :     user object is keyed on the user's login name.</Notes>
237 :     <Fields>
238 :     <Field name="description" type="string">
239 :     <Notes>Full name or description of this user.</Notes>
240 :     </Field>
241 :     <Field name="access-code" type="key-string" relation="UserAccess">
242 :     <Notes>This field is deprecated.</Notes>
243 :     </Field>
244 :     </Fields>
245 : parrello 1.50 </Entity>
246 : parrello 1.27 <Entity name="SynonymGroup" keyType="id-string">
247 : parrello 1.52 <Notes>A _synonym group_ represents a group of features. Features that represent substantially
248 :     identical proteins or DNA sequences are mapped to the same synonym group, and this information is
249 :     used to expand similarities.</Notes>
250 : parrello 1.27 </Entity>
251 : parrello 1.1 <Entity name="Role" keyType="string">
252 : parrello 1.52 <Notes>A _role_ describes a biological function that may be fulfilled by a feature.
253 : parrello 1.1 One of the main goals of the database is to record the roles of the various features.</Notes>
254 : parrello 1.50 </Entity>
255 :     <Entity name="RoleEC" keyType="string">
256 :     <Notes>EC code for a role.</Notes>
257 : parrello 1.1 </Entity>
258 :     <Entity name="Annotation" keyType="name-string">
259 : parrello 1.52 <Notes>An _annotation_ contains supplementary information about a feature. The most
260 :     important type of annotation is the assignment of a [[functional role]]; however,
261 :     other types of annotations are also possible.</Notes>
262 : parrello 1.8 <Fields>
263 :     <Field name="time" type="date">
264 :     <Notes>Date and time of the annotation.</Notes>
265 :     </Field>
266 :     <Field name="annotation" type="text">
267 :     <Notes>Text of the annotation.</Notes>
268 :     </Field>
269 : parrello 1.1 </Fields>
270 : parrello 1.26 <Indexes>
271 :     <Index>
272 :     <Notes>This index allows the user to find recent annotations.</Notes>
273 :     <IndexFields>
274 :     <IndexField name="time" order="descending" />
275 :     </IndexFields>
276 :     </Index>
277 :     </Indexes>
278 : parrello 1.1 </Entity>
279 : parrello 1.15 <Entity name="Reaction" keyType="key-string">
280 : parrello 1.52 <Notes>A _reaction_ is a chemical process catalyzed by a protein. The reaction ID
281 : parrello 1.15 is generally a small number preceded by a letter.</Notes>
282 :     <Fields>
283 :     <Field name="url" type="string" relation="ReactionURL">
284 :     <Notes>HTML string containing a link to a web location that describes the
285 :     reaction. This field is optional.</Notes>
286 :     </Field>
287 :     <Field name="rev" type="boolean">
288 :     <Notes>TRUE if this reaction is reversible, else FALSE</Notes>
289 :     </Field>
290 :     </Fields>
291 :     </Entity>
292 :     <Entity name="Compound" keyType="name-string">
293 : parrello 1.52 <Notes>A _compound_ is a chemical that participates in a reaction.
294 : parrello 1.15 All compounds have a unique ID and may also have one or more names.</Notes>
295 :     <Fields>
296 : parrello 1.50 <Field name="label" type="string">
297 : parrello 1.52 <Notes>Name used in reaction display strings. This is the same as the name
298 :     possessing a priority of 1, but it is placed here to speed up the query
299 :     used to create the display strings.</Notes>
300 : parrello 1.19 </Field>
301 : parrello 1.15 </Fields>
302 : parrello 1.50 </Entity>
303 :     <Entity name="CompoundName" keyType="string">
304 : parrello 1.52 <Notes>A _compound name_ is a common name for the chemical represented by a
305 : parrello 1.50 compound.</Notes>
306 :     </Entity>
307 :     <Entity name="CompoundCAS" keyType="name-string">
308 : parrello 1.52 <Notes>This entity represents the [[http://www.cas.org/ Chemical Abstract Service]] ID for a
309 :     compound. Each Compound has at most one CAS ID.</Notes>
310 : parrello 1.15 </Entity>
311 : parrello 1.5 <Entity name="Subsystem" keyType="string">
312 : parrello 1.52 <Notes>A _subsystem_ is a collection of roles that work together in a cell. Identification of subsystems
313 :     is an important tool for recognizing parallel genetic features in different organisms. See also
314 : parrello 1.53 [[Subsystems Approach]] and [[Subsystem]].</Notes>
315 : parrello 1.15 <Fields>
316 :     <Field name="curator" type="string">
317 :     <Notes>Name of the person currently in charge of the subsystem.</Notes>
318 :     </Field>
319 :     <Field name="notes" type="text">
320 :     <Notes>Descriptive notes about the subsystem.</Notes>
321 :     </Field>
322 : parrello 1.52 <Field name="description" type="text">
323 :     <Notes>Description of the subsystem's function.</Notes>
324 :     </Field>
325 : parrello 1.28 <Field name="classification" type="string" relation="SubsystemClass">
326 : parrello 1.44 <Notes>Classification string, colon-delimited. This string organizes the
327 : parrello 1.42 subsystems into a hierarchy.</Notes>
328 : parrello 1.28 </Field>
329 : parrello 1.15 </Fields>
330 :     </Entity>
331 :     <Entity name="RoleSubset" keyType="string">
332 : parrello 1.52 <Notes>A _role subset_ is a named collection of roles in a particular subsystem. The
333 : parrello 1.15 subset names are generally very short, non-unique strings. The ID of the parent
334 :     subsystem is prefixed to the subset ID in order to make it unique.</Notes>
335 :     </Entity>
336 :     <Entity name="GenomeSubset" keyType="string">
337 : parrello 1.52 <Notes>A _genome subset_ is a named collection of genomes that participate
338 : parrello 1.15 in a particular subsystem. The subset names are generally very short, non-unique
339 :     strings. The ID of the parent subsystem is prefixed to the subset ID in order
340 :     to make it unique.</Notes>
341 : parrello 1.1 </Entity>
342 : parrello 1.24 <Entity name="SSCell" keyType="hash-string">
343 : parrello 1.52 <Notes>Part of the process of [[SubsystemsApproach][subsystem annotation]] of [[features]]
344 :     is creating a spreadsheet of genomes and roles to which features are assigned. A _spreadsheet
345 :     cell_ represents one of the positions on the spreadsheet.</Notes>
346 : parrello 1.1 </Entity>
347 : parrello 1.8 <Entity name="Property" keyType="int">
348 : parrello 1.52 <Notes>A _property_ is a type of assertion that could be made about the properties of
349 : parrello 1.8 a particular feature. Each property instance is a key/value pair and can be associated
350 :     with many different features. Conversely, a feature can be associated with many key/value
351 :     pairs, even some that notionally contradict each other. For example, there can be evidence
352 :     that a feature is essential to the organism's survival and evidence that it is superfluous.</Notes>
353 :     <Fields>
354 :     <Field name="property-name" type="name-string">
355 :     <Notes>Name of this property.</Notes>
356 :     </Field>
357 :     <Field name="property-value" type="string">
358 :     <Notes>Value associated with this property. For each property
359 :     name, there must by a property record for all of its possible
360 :     values.</Notes>
361 :     </Field>
362 :     </Fields>
363 :     <Indexes>
364 :     <Index>
365 :     <Notes>This index enables the application to find all values for a specified property
366 :     name, or any given name/value pair.</Notes>
367 :     <IndexFields>
368 :     <IndexField name="property-name" order="ascending" />
369 :     <IndexField name="property-value" order="ascending" />
370 :     </IndexFields>
371 :     </Index>
372 :     </Indexes>
373 :     </Entity>
374 :     <Entity name="Diagram" keyType="name-string">
375 : parrello 1.52 <Notes>A functional diagram describes a network chemical reactions, often comprising a single
376 : parrello 1.8 subsystem. A diagram is identified by a short name and contains a longer descriptive name.
377 :     The actual diagram shows which functional roles guide the reactions along with the inputs
378 : parrello 1.52 and outputs; the database, however, only indicates which roles belong to a particular
379 :     diagram's map.</Notes>
380 : parrello 1.8 <Fields>
381 :     <Field name="name" type="text">
382 :     <Notes>Descriptive name of this diagram.</Notes>
383 :     </Field>
384 :     </Fields>
385 :     </Entity>
386 :     <Entity name="ExternalAliasOrg" keyType="name-string">
387 :     <Notes>An external alias is a feature name for a functional assignment that is not a
388 :     FIG ID. Functional assignments for external aliases are kept in a separate section of
389 :     the database. This table contains a description of the relevant organism for an
390 :     external alias functional assignment.</Notes>
391 :     <Fields>
392 :     <Field name="org" type="text">
393 :     <Notes>Descriptive name of the target organism for this external alias.</Notes>
394 :     </Field>
395 :     </Fields>
396 :     </Entity>
397 :     <Entity name="ExternalAliasFunc" keyType="name-string">
398 :     <Notes>An external alias is a feature name for a functional assignment that is not a
399 :     FIG ID. Functional assignments for external aliases are kept in a separate section of
400 :     the database. This table contains the functional role for the external alias functional
401 :     assignment.</Notes>
402 :     <Fields>
403 :     <Field name="func" type="text">
404 :     <Notes>Functional role for this external alias.</Notes>
405 :     </Field>
406 :     </Fields>
407 :     </Entity>
408 : parrello 1.31 <Entity name="Family" keyType="id-string">
409 : parrello 1.52 <Notes>A _family_ (also called a [[FigFam]]) is a group of homologous features believed to have
410 :     the same function. Families provide a mechanism for verifying the accuracy of functional assignments
411 :     and are also used in [[Rapid Annotation]] and in determining phylogenetic trees.</Notes>
412 : parrello 1.31 <Fields>
413 : parrello 1.32 <Field name="function" type="text">
414 : parrello 1.31 <Notes>The functional assignment expected for all PEGs in this family.</Notes>
415 :     </Field>
416 : parrello 1.33 <Field name="size" type="int">
417 : parrello 1.31 <Notes>The number of proteins in this family. This may be larger than the
418 :     number of PEGs included in the family, since the family may also contain external
419 :     IDs.</Notes>
420 :     </Field>
421 :     </Fields>
422 :     </Entity>
423 : parrello 1.49 <Entity name="PDB" keyType="id-string">
424 : parrello 1.52 <Notes>A PDB is a protein data bank entry containing information that can be used
425 :     to determine the shape of the protein and the energies required to dock with it.
426 :     The ID is the four-character name used on the [[http://www.rcsb.org PDB web site]].</Notes>
427 : parrello 1.49 <Fields>
428 :     <Field name="docking-count" type="int">
429 :     <Notes>The number of ligands that have been docked against this PDB.</Notes>
430 :     </Field>
431 :     </Fields>
432 :     <Indexes>
433 :     <Index>
434 :     <IndexFields>
435 :     <IndexField name="docking-count" order="descending" />
436 :     <IndexField name="id" order="ascending" />
437 :     </IndexFields>
438 :     </Index>
439 :     </Indexes>
440 :     </Entity>
441 :     <Entity name="Ligand" keyType="id-string">
442 :     <Notes>A Ligand is a chemical of interest in computing docking energies against a PDB.
443 : parrello 1.52 The ID of the ligand is an 8-digit ID number in the [[http://zinc.docking.org ZINC database]].</Notes>
444 : parrello 1.49 <Fields>
445 :     <Field name="name" type="long-string">
446 :     <Notes>Chemical name of this ligand.</Notes>
447 :     </Field>
448 :     </Fields>
449 :     </Entity>
450 : parrello 1.1 </Entities>
451 :     <Relationships>
452 : parrello 1.50 <Relationship name="IsPresentOnProteinOf" from="CDD" to="Feature" arity="MM">
453 :     <Notes>This relationship connects a feature to its CDD protein domains. The
454 :     match score is included as intersection data.</Notes>
455 :     <Fields>
456 :     <Field name="score" type="float">
457 :     <Notes>This is the match score between the feature and the CDD. A
458 :     lower score is a better match.</Notes>
459 :     </Field>
460 :     </Fields>
461 :     <FromIndex>
462 :     <IndexFields>
463 :     <IndexField name="score" order="ascending" />
464 :     </IndexFields>
465 :     </FromIndex>
466 :     </Relationship>
467 :     <Relationship name="IsIdentifiedByCAS" from="Compound" to="CompoundCAS" arity="MM">
468 :     <Notes>Relates a compound's CAS ID to the compound itself. Every CAS ID is
469 :     associated with a compound, and some are associated with two compounds, but not
470 :     all compounds have CAS IDs.</Notes>
471 :     </Relationship>
472 :     <Relationship name="IsIdentifiedByEC" from="Role" to="RoleEC" arity="MM">
473 :     <Notes>Relates a role to its EC number. Every EC number is associated with a
474 :     role, but not all roles have EC numbers.</Notes>
475 :     </Relationship>
476 :     <Relationship name="IsAliasOf" from="FeatureAlias" to="Feature" arity="MM">
477 :     <Notes>Connects an alias to the feature it represents. Every alias connects
478 :     to at least 1 feature, and a feature connects to many aliases.</Notes>
479 :     </Relationship>
480 :     <Relationship name="HasCompoundName" from="Compound" to="CompoundName" arity="MM">
481 :     <Notes>Connects a compound to its names. A compound generally has several
482 :     names</Notes>
483 :     <Fields>
484 :     <Field name="priority" type="int">
485 :     <Notes>Priority of this name, with 1 being the highest priority, 2
486 :     the next highest, and so forth.</Notes>
487 :     </Field>
488 :     </Fields>
489 :     <FromIndex>
490 :     <Notes>This index enables the application to view the names of a compound
491 :     in priority order.</Notes>
492 :     <IndexFields>
493 :     <IndexField name="priority" order="ascending" />
494 :     </IndexFields>
495 :     </FromIndex>
496 :     </Relationship>
497 : parrello 1.49 <Relationship name="IsProteinForFeature" from="PDB" to="Feature" arity="MM">
498 :     <Notes>Relates a PDB to features that produce highly similar proteins.</Notes>
499 :     <Fields>
500 :     <Field name="score" type="float">
501 :     <Notes>Similarity score for the comparison between the feature and
502 :     the PDB protein. A lower score indicates a better match.</Notes>
503 :     </Field>
504 :     <Field name="start-location" type="int">
505 :     <Notes>Starting location within the feature of the matching region.</Notes>
506 :     </Field>
507 :     <Field name="end-location" type="int">
508 :     <Notes>Ending location within the feature of the matching region.</Notes>
509 :     </Field>
510 :     </Fields>
511 :     <ToIndex>
512 :     <Notes>This index enables the application to view the PDBs of a
513 :     feature in order from the closest match to the furthest.</Notes>
514 :     <IndexFields>
515 :     <IndexField name="score" order="ascending" />
516 :     </IndexFields>
517 :     </ToIndex>
518 :     <FromIndex>
519 :     <Notes>This index enables the application to view the features of
520 :     a PDB in order from the closest match to the furthest.</Notes>
521 :     <IndexFields>
522 :     <IndexField name="score" order="ascending" />
523 :     </IndexFields>
524 :     </FromIndex>
525 :     </Relationship>
526 :     <Relationship name="DocksWith" from="PDB" to="Ligand" arity="MM">
527 : parrello 1.52 <Notes>Indicates that a [[docking result]] exists between a PDB and a ligand. The
528 : parrello 1.49 docking result describes the energy required for the ligand to dock with
529 :     the protein described by the PDB. A lower energy indicates the ligand has a
530 :     good chance of disabling the protein. At the current time, only the best
531 :     docking results are kept.</Notes>
532 :     <Fields>
533 :     <Field name="reason" type="id-string">
534 :     <Notes>Indication of the reason for determining the docking result.
535 : parrello 1.52 A value of =Random= indicates the docking was attempted as a part
536 : parrello 1.49 of a random survey used to determine the docking characteristics of the
537 : parrello 1.52 PDB. A value of =Rich= indicates the docking was attempted because
538 : parrello 1.49 a low-energy docking result was predicted for the ligand with respect
539 :     to the PDB.</Notes>
540 :     </Field>
541 :     <Field name="tool" type="id-string">
542 :     <Notes>Name of the tool used to produce the docking result.</Notes>
543 :     </Field>
544 :     <Field name="total-energy" type="float">
545 :     <Notes>Total energy required for the ligand to dock with the PDB
546 :     protein, in kcal/mol. A negative value means energy is released.</Notes>
547 :     </Field>
548 :     <Field name="vanderwalls-energy" type="float">
549 :     <Notes>Docking energy in kcal/mol that results from the geometric fit
550 :     (Van der Waals force) between the PDB and the ligand.</Notes>
551 :     </Field>
552 :     <Field name="electrostatic-energy" type="float">
553 :     <Notes>Docking energy in kcal/mol that results from the movement of
554 : parrello 1.52 electrons (electrostatic force) between the PDB and the ligand.</Notes>
555 : parrello 1.49 </Field>
556 :     </Fields>
557 :     <FromIndex>
558 :     <Notes>This index enables the application to view a PDB's docking results from
559 :     the lowest energy (best docking) to highest energy (worst docking).</Notes>
560 :     <IndexFields>
561 :     <IndexField name="total-energy" order="ascending" />
562 :     </IndexFields>
563 :     </FromIndex>
564 :     <ToIndex>
565 :     <Notes>This index enables the application to view a ligand's docking results from
566 : parrello 1.52 the lowest energy (best docking) to highest energy (worst docking).</Notes>
567 : parrello 1.49 </ToIndex>
568 :     </Relationship>
569 : parrello 1.34 <Relationship name="IsFamilyForFeature" from="Family" to="Feature" arity="MM">
570 : parrello 1.31 <Notes>This relationship connects a protein family to all of its PEGs and connects
571 :     each PEG to all of its protein families.</Notes>
572 :     </Relationship>
573 : parrello 1.50 <Relationship name="IsSynonymGroupFor" from="SynonymGroup" to="Feature" arity="MM">
574 : parrello 1.27 <Notes>This relation connects a synonym group to the features that make it
575 :     up.</Notes>
576 :     </Relationship>
577 : parrello 1.24 <Relationship name="HasFeature" from="Genome" to="Feature" arity="1M">
578 :     <Notes>This relationship connects a genome to all of its features. This
579 :     relationship is redundant in a sense, because the genome ID is part
580 :     of the feature ID; however, it makes the creation of certain queries more
581 :     convenient because you can drag in filtering information for a feature's
582 :     genome.</Notes>
583 :     <Fields>
584 :     <Field name="type" type="key-string">
585 :     <Notes>Feature type (eg. peg, rna)</Notes>
586 :     </Field>
587 :     </Fields>
588 : parrello 1.38 <FromIndex>
589 : parrello 1.24 <Notes>This index enables the application to view the features of a
590 :     Genome sorted by type.</Notes>
591 :     <IndexFields>
592 :     <IndexField name="type" order="ascending" />
593 :     </IndexFields>
594 : parrello 1.38 </FromIndex>
595 : parrello 1.24 </Relationship>
596 : parrello 1.1 <Relationship name="HasContig" from="Genome" to="Contig" arity="1M">
597 :     <Notes>This relationship connects a genome to the contigs that contain the actual genetic
598 :     information.</Notes>
599 :     </Relationship>
600 :     <Relationship name="ComesFrom" from="Genome" to="Source" arity="MM">
601 :     <Notes>This relationship connects a genome to the sources that mapped it. A genome can
602 :     come from a single source or from a cooperation among multiple sources.</Notes>
603 :     </Relationship>
604 :     <Relationship name="IsMadeUpOf" from="Contig" to="Sequence" arity="1M">
605 :     <Notes>A contig is stored in the database as an ordered set of sequences. By splitting the
606 :     contig into sequences, we get a performance boost from only needing to keep small portions
607 :     of a contig in memory at any one time. This relationship connects the contig to its
608 :     constituent sequences.</Notes>
609 :     <Fields>
610 :     <Field name="len" type="int">
611 : parrello 1.15 <Notes>Length of the sequence.</Notes>
612 :     </Field>
613 : parrello 1.1 <Field name="start-position" type="int">
614 : parrello 1.15 <Notes>Index (1-based) of the point in the contig where this
615 :     sequence starts.</Notes>
616 :     </Field>
617 : parrello 1.1 </Fields>
618 :     <FromIndex>
619 :     <Notes>This index enables the application to find all of the sequences in
620 : parrello 1.8 a contig in order, and makes it easier to find a particular residue section.</Notes>
621 : parrello 1.1 <IndexFields>
622 :     <IndexField name="start-position" order="ascending" />
623 :     <IndexField name="len" order="ascending" />
624 :     </IndexFields>
625 :     </FromIndex>
626 :     </Relationship>
627 :     <Relationship name="IsTargetOfAnnotation" from="Feature" to="Annotation" arity="1M">
628 :     <Notes>This relationship connects a feature to its annotations.</Notes>
629 :     </Relationship>
630 :     <Relationship name="MadeAnnotation" from="SproutUser" to="Annotation" arity="1M">
631 :     <Notes>This relationship connects an annotation to the user who made it.</Notes>
632 :     </Relationship>
633 :     <Relationship name="ParticipatesIn" from="Genome" to="Subsystem" arity="MM">
634 :     <Notes>This relationship connects subsystems to the genomes that use
635 :     it. If the subsystem has been curated for the genome, then the subsystem's roles will also be
636 : parrello 1.52 connected to the genome features through the *SSCell* object.</Notes>
637 : parrello 1.15 <Fields>
638 :     <Field name="variant-code" type="key-string">
639 : parrello 1.20 <Notes>Code indicating the subsystem variant to which this
640 : parrello 1.15 genome belongs. Each subsystem can have multiple variants. A variant
641 : parrello 1.52 code of =-1= indicates that the genome does not have a functional
642 :     variant of the subsystem. A variant code of =0= indicates that
643 : parrello 1.20 the genome's participation is considered iffy.</Notes>
644 : parrello 1.15 </Field>
645 :     </Fields>
646 :     <ToIndex>
647 :     <Notes>This index enables the application to find all of the genomes using
648 :     a subsystem in order by variant code, which is how we wish to display them
649 :     in the spreadsheets.</Notes>
650 :     <IndexFields>
651 :     <IndexField name="variant-code" order="ascending" />
652 :     </IndexFields>
653 :     </ToIndex>
654 : parrello 1.1 </Relationship>
655 :     <Relationship name="OccursInSubsystem" from="Role" to="Subsystem" arity="MM">
656 :     <Notes>This relationship connects roles to the subsystems that implement them. </Notes>
657 : parrello 1.15 <Fields>
658 : parrello 1.50 <Field name="abbr" type="name-string">
659 :     <Notes>Abbreviated name for the role, generally non-unique, but useful
660 :     in column headings for HTML tables.</Notes>
661 :     </Field>
662 : parrello 1.15 <Field name="column-number" type="int">
663 :     <Notes>Column number for this role in the specified subsystem's
664 :     spreadsheet.</Notes>
665 :     </Field>
666 :     </Fields>
667 :     <ToIndex>
668 :     <Notes>This index enables the application to see the subsystem roles
669 :     in column order. The ordering of the roles is usually significant,
670 :     so it is important to preserve it.</Notes>
671 :     <IndexFields>
672 :     <IndexField name="column-number" order="ascending" />
673 :     </IndexFields>
674 :     </ToIndex>
675 : parrello 1.1 </Relationship>
676 :     <Relationship name="IsGenomeOf" from="Genome" to="SSCell" arity="1M">
677 :     <Notes>This relationship connects a subsystem's spreadsheet cell to the
678 :     genome for the spreadsheet column.</Notes>
679 :     </Relationship>
680 :     <Relationship name="IsRoleOf" from="Role" to="SSCell" arity="1M">
681 :     <Notes>This relationship connects a subsystem's spreadsheet cell to the
682 :     role for the spreadsheet row.</Notes>
683 :     </Relationship>
684 :     <Relationship name="ContainsFeature" from="SSCell" to="Feature" arity="MM">
685 :     <Notes>This relationship connects a subsystem's spreadsheet cell to the
686 :     features assigned to it.</Notes>
687 : parrello 1.15 <Fields>
688 :     <Field name="cluster-number" type="int">
689 :     <Notes>ID of this feature's cluster. Clusters represent families of
690 :     related proteins participating in a subsystem.</Notes>
691 :     </Field>
692 :     </Fields>
693 :     </Relationship>
694 :     <Relationship name="IsAComponentOf" from="Compound" to="Reaction" arity="MM">
695 :     <Notes>This relationship connects a reaction to the compounds that participate
696 :     in it.</Notes>
697 :     <Fields>
698 :     <Field name="product" type="boolean">
699 :     <Notes>TRUE if the compound is a product of the reaction, FALSE if
700 :     it is a substrate. When a reaction is written on paper in
701 :     chemical notation, the substrates are left of the arrow and the
702 :     products are to the right. Sorting on this field will cause
703 :     the substrates to appear first, followed by the products. If the
704 :     reaction is reversible, then the notion of substrates and products
705 :     is not at intuitive; however, a value here of FALSE still puts the
706 :     compound left of the arrow and a value of TRUE still puts it to the
707 :     right.</Notes>
708 :     </Field>
709 : parrello 1.19 <Field name="stoichiometry" type="key-string">
710 : parrello 1.15 <Notes>Number of molecules of the compound that participate in a
711 :     single instance of the reaction. For example, if a reaction
712 : parrello 1.19 produces two water molecules, the stoichiometry of water for the
713 : parrello 1.15 reaction would be two. When a reaction is written on paper in
714 : parrello 1.19 chemical notation, the stoichiometry is the number next to the
715 : parrello 1.15 chemical formula of the compound.</Notes>
716 :     </Field>
717 :     <Field name="main" type="boolean">
718 :     <Notes>TRUE if this compound is one of the main participants in
719 :     the reaction, else FALSE. It is permissible for none of the
720 :     compounds in the reaction to be considered main, in which
721 :     case this value would be FALSE for all of the relevant
722 :     compounds.</Notes>
723 :     </Field>
724 :     <Field name="loc" type="key-string">
725 :     <Notes>An optional character string that indicates the relative
726 :     position of this compound in the reaction's chemical formula. The
727 :     location affects the way the compounds present as we cross the
728 :     relationship from the reaction side. The product/substrate flag
729 :     comes first, then the value of this field, then the main flag.
730 :     The default value is an empty string; however, the empty string
731 :     sorts first, so if this field is used, it should probably be
732 :     used for every compound in the reaction.</Notes>
733 :     </Field>
734 : parrello 1.19 <Field name="discriminator" type="int">
735 :     <Notes>A unique ID for this record. The discriminator does not
736 :     provide any useful data, but it prevents identical records from
737 :     being collapsed by the SELECT DISTINCT command used by ERDB to
738 :     retrieve data.</Notes>
739 :     </Field>
740 : parrello 1.15 </Fields>
741 :     <ToIndex>
742 :     <Notes>This index presents the compounds in the reaction in the
743 :     order they should be displayed when writing it in chemical notation.
744 :     All the substrates appear before all the products, and within that
745 :     ordering, the main compounds appear first.</Notes>
746 : parrello 1.19 <IndexFields>
747 :     <IndexField name="product" order="ascending" />
748 :     <IndexField name="loc" order="ascending" />
749 :     <IndexField name="main" order="descending" />
750 :     </IndexFields>
751 : parrello 1.15 </ToIndex>
752 : parrello 1.1 </Relationship>
753 :     <Relationship name="IsLocatedIn" from="Feature" to="Contig" arity="MM">
754 :     <Notes>This relationship connects a feature to the contig segments that work together
755 :     to effect it. The segments are numbered sequentially starting from 1. The database is
756 :     required to place an upper limit on the length of each segment. If a segment is longer
757 : parrello 1.52 than the maximum, it can be broken into smaller bits. The upper limit enables applications
758 :     to locate all features that contain a specific residue. For example, if the upper limit
759 :     is 100 and we are looking for a feature that contains residue 234 of contig *ABC*, we
760 :     can look for features with a begin point between 135 and 333. The results can then be
761 :     filtered by direction and length of the segment.</Notes>
762 : parrello 1.1 <Fields>
763 :     <Field name="locN" type="int">
764 : parrello 1.8 <Notes>Sequence number of this segment.</Notes>
765 :     </Field>
766 : parrello 1.1 <Field name="beg" type="int">
767 : parrello 1.8 <Notes>Index (1-based) of the first residue in the contig that
768 :     belongs to the segment.</Notes>
769 :     </Field>
770 : parrello 1.1 <Field name="len" type="int">
771 : parrello 1.8 <Notes>Number of residues in the segment. A length of 0 identifies
772 :     a specific point between residues. This is the point before the residue if the direction
773 :     is forward and the point after the residue if the direction is backward.</Notes>
774 :     </Field>
775 : parrello 1.1 <Field name="dir" type="char">
776 : parrello 1.52 <Notes>Direction of the segment: =+= if it is forward and
777 :     =-= if it is backward.</Notes>
778 : parrello 1.8 </Field>
779 : parrello 1.1 </Fields>
780 : parrello 1.45 <FromIndex>
781 : parrello 1.1 <Notes>This index allows the application to find all the segments of a feature in
782 : parrello 1.8 the proper order.</Notes>
783 : parrello 1.1 <IndexFields>
784 :     <IndexField name="locN" order="ascending" />
785 :     </IndexFields>
786 :     </FromIndex>
787 :     <ToIndex>
788 :     <Notes>This index is the one used by applications to find all the feature
789 :     segments that contain a specific residue.</Notes>
790 :     <IndexFields>
791 :     <IndexField name="beg" order="ascending" />
792 :     </IndexFields>
793 :     </ToIndex>
794 :     </Relationship>
795 : parrello 1.8 <Relationship name="HasProperty" from="Feature" to="Property" arity="MM">
796 :     <Notes>This relationship connects a feature to its known property values.
797 :     The relationship contains text data that indicates the paper or organization
798 :     that discovered evidence that the feature possesses the property. So, for
799 :     example, if two papers presented evidence that a feature is essential,
800 :     there would be an instance of this relationship for both.</Notes>
801 :     <Fields>
802 :     <Field name="evidence" type="text">
803 :     <Notes>URL or citation of the paper or
804 :     institution that reported evidence of the relevant feature possessing
805 :     the specified property value.</Notes>
806 :     </Field>
807 :     </Fields>
808 :     </Relationship>
809 :     <Relationship name="RoleOccursIn" from="Role" to="Diagram" arity="MM">
810 :     <Notes>This relationship connects a role to the diagrams on which it
811 :     appears. A role frequently identifies an enzyme, and can appear in many
812 :     diagrams. A diagram generally contains many different roles.</Notes>
813 :     </Relationship>
814 :     <Relationship name="HasSSCell" from="Subsystem" to="SSCell" arity="1M">
815 :     <Notes>This relationship connects a subsystem to the spreadsheet cells
816 :     used to analyze and display it. The cells themselves can be thought of
817 :     as a grid with Roles on one axis and Genomes on the other. The
818 :     various features of the subsystem are then assigned to the cells.</Notes>
819 :     </Relationship>
820 :     <Relationship name="IsTrustedBy" from="SproutUser" to="SproutUser" arity="MM">
821 :     <Notes>This relationship identifies the users trusted by each
822 :     particular user. When viewing functional assignments, the
823 :     assignment displayed is the most recent one by a user trusted
824 :     by the current user. The current user implicitly trusts himself.
825 :     If no trusted users are specified in the database, the user
826 : parrello 1.52 also implicitly trusts the user =FIG=.</Notes>
827 : parrello 1.8 </Relationship>
828 : parrello 1.15 <Relationship name="ConsistsOfRoles" from="RoleSubset" to="Role" arity="MM">
829 :     <Notes>This relationship connects a role subset to the roles that it covers.
830 :     A subset is, essentially, a named group of roles belonging to a specific
831 :     subsystem, and this relationship effects that. Note that will a role
832 :     may belong to many subsystems, a subset belongs to only one subsystem,
833 :     and all roles in the subset must have that subsystem in common.</Notes>
834 :     </Relationship>
835 :     <Relationship name="ConsistsOfGenomes" from="GenomeSubset" to="Genome" arity="MM">
836 :     <Notes>This relationship connects a subset to the genomes that it covers.
837 :     A subset is, essentially, a named group of genomes participating in a specific
838 :     subsystem, and this relationship effects that. Note that while a genome
839 :     may belong to many subsystems, a subset belongs to only one subsystem,
840 :     and all genomes in the subset must have that subsystem in common.</Notes>
841 :     </Relationship>
842 :     <Relationship name="HasRoleSubset" from="Subsystem" to="RoleSubset" arity="1M">
843 :     <Notes>This relationship connects a subsystem to its constituent
844 :     role subsets. Note that some roles in a subsystem may not belong to a
845 :     subset, so the relationship between roles and subsystems cannot be
846 :     derived from the relationships going through the subset.</Notes>
847 :     </Relationship>
848 :     <Relationship name="HasGenomeSubset" from="Subsystem" to="GenomeSubset" arity="1M">
849 :     <Notes>This relationship connects a subsystem to its constituent
850 :     genome subsets. Note that some genomes in a subsystem may not belong to a
851 :     subset, so the relationship between genomes and subsystems cannot be
852 :     derived from the relationships going through the subset.</Notes>
853 :     </Relationship>
854 :     <Relationship name="Catalyzes" from="Role" to="Reaction" arity="MM">
855 :     <Notes>This relationship connects a role to the reactions it catalyzes.
856 :     The purpose of a role is to create proteins that trigger certain
857 :     chemical reactions. A single reaction can be triggered by many roles,
858 :     and a role can trigger many reactions.</Notes>
859 :     </Relationship>
860 : parrello 1.39 <Relationship name="HasRoleInSubsystem" from="Feature" to="Subsystem" arity="MM">
861 :     <Notes>This relationship connects a feature to the subsystems in which it
862 :     participates. This is technically redundant information, but it is used
863 : parrello 1.52 so often that it gets its own table for performance reasons.</Notes>
864 : parrello 1.40 <Fields>
865 :     <Field name="genome" type="name-string">
866 :     <Notes>ID of the genome containing the feature</Notes>
867 :     </Field>
868 :     <Field name="type" type="key-string">
869 :     <Notes>Feature type (eg. peg, rna)</Notes>
870 :     </Field>
871 :     </Fields>
872 :     <ToIndex>
873 :     <Notes>This index enables the application to view the features of a
874 :     subsystem sorted by genome and feature type.</Notes>
875 :     <IndexFields>
876 :     <IndexField name="genome" order="ascending" />
877 :     <IndexField name="type" order="ascending" />
878 :     </IndexFields>
879 :     </ToIndex>
880 : parrello 1.39 </Relationship>
881 : parrello 1.1 </Relationships>
882 :     </Database>

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3