[Bio] / Sprout / SproutDBD.xml Repository:
ViewVC logotype

Annotation of /Sprout/SproutDBD.xml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.3 - (view) (download) (as text)

1 : parrello 1.1 <?xml version="1.0" encoding="utf-8" ?>
2 :     <Database>
3 :     <Title>Sprout Genome and Subsystem Database</Title>
4 :     <Entities>
5 :     <Entity name="Genome" keyType="name-string">
6 :     <Notes>A [i]genome[/i] contains the sequence data for a particular individual organism.</Notes>
7 :     <Fields>
8 :     <Field name="genus" type="name-string">
9 :     <Notes>Genus of the relevant organism.</Notes>
10 :     <DataGen pass="1">RandParam('streptococcus', 'staphyloccocus', 'felis', 'homo', 'ficticio', 'strangera', 'escherischia', 'carborunda')</DataGen>
11 :     </Field>
12 :     <Field name="species" type="name-string">
13 :     <Notes>Species of the relevant organism.</Notes>
14 :     <DataGen pass="1">StringGen('PKVKVKVKVKV')</DataGen>
15 :     </Field>
16 :     <Field name="unique-characterization" type="medium-string">
17 :     <Notes>The unique characterization identifies the particular organism instance from which the
18 :     genome is taken. It is possible to have in the database more than one genome for a
19 :     particular species, and every individual organism has variations in its DNA.</Notes>
20 :     <DataGen>StringGen('PKVKVK999')</DataGen>
21 :     </Field>
22 :     <Field name="access-code" type="key-string">
23 :     <Notes>The access code determines which users can look at the data relating to this genome.
24 :     Each user is associated with a set of access codes. In order to view a genome, one of
25 :     the user's access codes must match this value.</Notes>
26 :     <DataGen>RandParam('low','medium','high')</DataGen>
27 :     </Field>
28 : parrello 1.3 <Field name="taxonomy" type="text">
29 :     <Notes>The taxonomy string contains the full taxonomy of the organism, while individual elements
30 :     separated by semi-colons (and optional white space), starting with the domain and ending with
31 :     the disambiguated genus and species (which is the organism's scientific name plus an
32 :     identifying string).</Notes>
33 :     <DataGen pass="2">join('; ', (RandParam('bacteria', 'archaea', 'eukaryote', 'virus', 'environmental'),
34 :     ListGen('PKVKVKVK', 5), $this->{genus}, $this->{species}))</DataGen>
35 :     </Field>
36 :     <Field name="group" relation="GenomeGroups" type="name-string">
37 :     <Notes>The group identifies a special grouping of organisms that would be displayed on a particular
38 :     page or of particular interest to a research group or web site. A single genome can belong to multiple
39 :     such groups or none at all.</Notes>
40 :     </Field>
41 : parrello 1.1 </Fields>
42 :     <Indexes>
43 :     <Index>
44 :     <Notes>This index allows the applications to find all genomes associated with
45 :     a specific access code, so that a complete list of the genomes users can view
46 :     may be generated.</Notes>
47 :     <IndexFields>
48 :     <IndexField name="access-code" order="ascending" />
49 :     <IndexField name="genus" order="ascending" />
50 :     <IndexField name="species" order="ascending" />
51 :     <IndexField name="unique-characterization" order="ascending" />
52 :     </IndexFields>
53 :     </Index>
54 :     <Index Unique="false">
55 :     <Notes>This index allows the applications to find all genomes for a particular
56 :     species.</Notes>
57 :     <IndexFields>
58 :     <IndexField name="genus" order="ascending" />
59 :     <IndexField name="species" order="ascending" />
60 :     <IndexField name="unique-characterization" order="ascending" />
61 :     </IndexFields>
62 :     </Index>
63 :     </Indexes>
64 :     </Entity>
65 :     <Entity name="Source" keyType="medium-string">
66 :     <Notes>A [i]source[/i] describes a place from which genome data was taken. This can be an organization
67 :     or a paper citation.</Notes>
68 :     <Fields>
69 :     <Field name="URL" type="string" relation="SourceURL">
70 :     <Notes>URL the paper cited or of the organization's web site. This field optional.</Notes>
71 :     <DataGen>"http://www.conservativecat.com/Ferdy/TestTarget.php?Source=" . $this->{id}</DataGen>
72 :     </Field>
73 :     <Field name="description" type="text">
74 :     <Notes>Description the source. The description can be a street address or a citation.</Notes>
75 :     <DataGen>$this->{id} . ': ' . StringGen(IntGen(50,200))</DataGen>
76 :     </Field>
77 :     </Fields>
78 :     </Entity>
79 :     <Entity name="Contig" keyType="name-string">
80 :     <Notes>A [i]contig[/i] is a contiguous run of residues. The contig's ID consists of the
81 :     genome ID followed by a name that identifies which contig this is for the parent genome. As
82 :     is the case with all keys in this database, the individual components are separated by a
83 :     period.
84 :     [p]A contig can contain over a million residues. For performance reasons, therefore,
85 :     the contig is split into multiple pieces called [i]sequences[/i]. The sequences
86 :     contain the characters that represent the residues as well as data on the quality of
87 :     the residue identification.</Notes>
88 :     </Entity>
89 :     <Entity name="Sequence" keyType="name-string">
90 :     <Notes>A [i]sequence[/i] is a continuous piece of a [i]contig[/i]. Contigs are split into
91 :     sequences so that we don't have to have the entire contig in memory when we are
92 :     manipulating it. The key of the sequence is the contig ID followed by the index of
93 :     the begin point.</Notes>
94 :     <Fields>
95 :     <Field name="sequence" type="text">
96 :     <Notes>String consisting of the residues. Each residue is described by a single
97 :     character in the string.</Notes>
98 :     <DataGen>RandChars("ACGT", IntGen(100,400))</DataGen>
99 :     </Field>
100 :     <Field name="quality-vector" type="text">
101 :     <Notes>String describing the quality data for each . Individual values will
102 :     be separated by periods. The value represents negative exponent of the probability
103 :     of error. Thus, for example, a quality of 30 indicates the probability of error is
104 :     10^-30. A higher quality number a better chance of a correct match. It is possible
105 :     that the quality data is known for a sequence. If that is the case, the quality
106 :     vector will contain the [b]unknown[/b].</Notes>
107 :     <DataGen>unknown</DataGen>
108 :     </Field>
109 :     </Fields>
110 :     </Entity>
111 :     <Entity name="Feature" keyType="name-string">
112 :     <Notes>A [i]feature[/i] is a part of a genome that is of special interest. Features
113 :     may be spread across multiple contigs of a genome, but never across more than
114 :     one genome. Features can be assigned to roles via spreadsheet cells,
115 :     and are the targets of annotation.</Notes>
116 :     <Fields>
117 :     <Field name="feature-type" type="string">
118 :     <Notes>Code indicating the type of this feature.</Notes>
119 :     <DataGen>RandParam('peg','rna')</DataGen>
120 :     </Field>
121 :     <Field name="alias" type="name-string" relation="FeatureAlias">
122 :     <Notes>Alternative name for this feature. feature can have many aliases.</Notes>
123 :     <DataGen testCount="3">StringGen('Pgi|99999', 'Puni|XXXXXX', 'PAAAAAA999')</DataGen>
124 :     </Field>
125 :     <Field name="translation" type="text" relation="FeatureTranslation">
126 :     <Notes>[i](optional)[/i] A of this feature's residues into character codes, formed by concatenating
127 :     the pieces of the feature together.</Notes>
128 :     <DataGen testCount="0"></DataGen>
129 :     </Field>
130 :     <Field name="upstream-sequence" type="text" relation="FeatureUpstream">
131 :     <Notes>Upstream sequence the feature. This includes residues preceding the feature as well as some of
132 :     the feature's initial residues.</Notes>
133 :     <DataGen testCount="0"></DataGen>
134 :     </Field>
135 :     <Field name="active" type="boolean">
136 :     <Notes>TRUE if this feature is still considered valid, if it has been logically deleted.</Notes>
137 :     <DataGen>1</DataGen>
138 :     </Field>
139 :     <Field name="link" type="text" relation="FeatureLink">
140 :     <Notes>Web hyperlink for this feature. A feature have no hyperlinks or it can have many. The
141 :     links are to other websites that have useful about the gene that the feature represents, and
142 :     are coded as raw HTML, using [b]&lt;a href="[i]link[/i]"&gt;[i]text[/i]&lt;/a&gt;[/b] notation.</Notes>
143 :     <DataGen testCount="3">'http://www.conservativecat.com/Ferdy/TestTarget.php?Source=' . $this->{id} .
144 :     "&amp;Number=" . IntGen(1,99)</DataGen>
145 :     </Field>
146 :     </Fields>
147 :     </Entity>
148 :     <Entity name="Role" keyType="string">
149 :     <Notes>A [i]role[/i] describes a biological function that may be fulfilled by a feature.
150 :     One of the main goals of the database is to record the roles of the various features.</Notes>
151 :     <Fields>
152 :     <Field name="name" type="string" relation="RoleName">
153 :     <Notes>Expanded name of the role. This value is generally only available for roles
154 :     that are encoded as EC numbers.</Notes>
155 :     <DataGen testCount="1">StringGen(IntGen(20,40)) . "(" . $this->{id} . ")"</DataGen>
156 :     </Field>
157 :     </Fields>
158 :     </Entity>
159 :     <Entity name="Annotation" keyType="name-string">
160 :     <Notes>An [i]annotation[/i] contains supplementary information about a feature. Annotations
161 :     are currently the only objects that may be inserted directly into the database. All other
162 :     information is loaded from data exported by the SEED.
163 :     [p]Each annotation is associated with a target [b]Feature[/b]. The key of the annotation
164 :     is the target feature ID followed by a timestamp.</Notes>
165 :     <Fields>
166 :     <Field name="time" type="date">
167 :     <Notes>Date and time of the annotation.</Notes>
168 :     </Field>
169 :     <Field name="annotation" type="text">
170 :     <Notes>Text of the annotation.</Notes>
171 :     </Field>
172 :     </Fields>
173 :     </Entity>
174 :     <Entity name="Subsystem" keyType="name-string">
175 :     <Notes>A [i]subsystem[/i] is a collection of roles that work together in a cell. Identification of subsystems
176 :     is an important tool for recognizing parallel genetic features in different organisms.</Notes>
177 :     </Entity>
178 :     <Entity name="SSCell" keyType="name-string">
179 :     <Notes>Part of the process of locating and assigning features is creating a spreadsheet of
180 :     genomes and roles to which features are assigned. A [i]spreadsheet cell[/i] represents one
181 :     of the positions on the spreadsheet.</Notes>
182 :     </Entity>
183 :     <Entity name="SproutUser" keyType="name-string">
184 :     <Notes>A [i]user[/i] is a person who can make annotations and view data in the database. The
185 :     user object is keyed on the user's login name.</Notes>
186 :     <Fields>
187 :     <Field name="description" type="string">
188 :     <Notes>Full name or description of this user.</Notes>
189 :     </Field>
190 :     <Field name="access-code" type="key-string" relation="UserAccess">
191 :     <Notes>Access code possessed by this
192 :     user. A user can have many access codes; a genome is accessible to the user if its
193 :     access code matches any one of the user's access codes.</Notes>
194 :     <DataGen testCount="2">RandParam('low', 'medium', 'high')</DataGen>
195 :     </Field>
196 :     </Fields>
197 :     </Entity>
198 :     <Entity name="Property" keyType="int">
199 :     <Notes>A [i]property[/i] is a type of assertion that could be made about the properties of
200 :     a particular feature. Each property instance is a key/value pair and can be associated
201 :     with many different features. Conversely, a feature can be associated with many key/value
202 :     pairs, even some that notionally contradict each other. For example, there can be evidence
203 :     that a feature is essential to the organism's survival and evidence that it is superfluous.</Notes>
204 :     <Fields>
205 :     <Field name="property-name" type="name-string">
206 :     <Notes>Name of this property.</Notes>
207 :     </Field>
208 :     <Field name="property-value" type="string">
209 :     <Notes>Value associated with this property. For each property
210 :     name, there must by a property record for all of its possible
211 :     values.</Notes>
212 :     </Field>
213 :     </Fields>
214 :     <Indexes>
215 :     <Index>
216 :     <Notes>This index enables the application to find all values for a specified property
217 :     name, or any given name/value pair.</Notes>
218 :     <IndexFields>
219 :     <IndexField name="property-name" order="ascending" />
220 :     <IndexField name="property-value" order="ascending" />
221 :     </IndexFields>
222 :     </Index>
223 :     </Indexes>
224 :     </Entity>
225 :     <Entity name="Diagram" keyType="name-string">
226 :     <Notes>A functional diagram describes the chemical reactions, often comprising a single
227 :     subsystem. A diagram is identified by a short name and contains a longer descriptive name.
228 :     The actual diagram shows which functional roles guide the reactions along with the inputs
229 :     and outputs; the database, however, only indicate which roles belong to a particular
230 :     map.</Notes>
231 :     <Fields>
232 :     <Field name="name" type="text">
233 :     <Notes>Descriptive name of this diagram.</Notes>
234 :     </Field>
235 :     </Fields>
236 :     </Entity>
237 :     <Entity name="ExternalAliasOrg" keyType="name-string">
238 :     <Notes>An external alias is a feature name for a functional assignment that is not a
239 :     FIG ID. Functional assignments for external aliases are kept in a separate section of
240 :     the database. This table contains a description of the relevant organism for an
241 :     external alias functional assignment.</Notes>
242 :     <Fields>
243 :     <Field name="org" type="text">
244 :     <Notes>Descriptive name of the target organism for this external alias.</Notes>
245 :     </Field>
246 :     </Fields>
247 :     </Entity>
248 :     <Entity name="ExternalAliasFunc" keyType="name-string">
249 :     <Notes>An external alias is a feature name for a functional assignment that is not a
250 :     FIG ID. Functional assignments for external aliases are kept in a separate section of
251 :     the database. This table contains the functional role for the external alias functional
252 :     assignment.</Notes>
253 :     <Fields>
254 :     <Field name="func" type="text">
255 :     <Notes>Functional role for this external alias.</Notes>
256 :     </Field>
257 :     </Fields>
258 :     </Entity>
259 :     </Entities>
260 :     <Relationships>
261 :     <Relationship name="HasContig" from="Genome" to="Contig" arity="1M">
262 :     <Notes>This relationship connects a genome to the contigs that contain the actual genetic
263 :     information.</Notes>
264 :     </Relationship>
265 :     <Relationship name="ComesFrom" from="Genome" to="Source" arity="MM">
266 :     <Notes>This relationship connects a genome to the sources that mapped it. A genome can
267 :     come from a single source or from a cooperation among multiple sources.</Notes>
268 :     </Relationship>
269 :     <Relationship name="IsMadeUpOf" from="Contig" to="Sequence" arity="1M">
270 :     <Notes>A contig is stored in the database as an ordered set of sequences. By splitting the
271 :     contig into sequences, we get a performance boost from only needing to keep small portions
272 :     of a contig in memory at any one time. This relationship connects the contig to its
273 :     constituent sequences.</Notes>
274 :     <Fields>
275 :     <Field name="len" type="int">
276 :     <Notes>Length of the sequence.</Notes>
277 :     </Field>
278 :     <Field name="start-position" type="int">
279 :     <Notes>Index (1-based) of the point in the contig where this
280 :     sequence starts.</Notes>
281 :     </Field>
282 :     </Fields>
283 :     <FromIndex>
284 :     <Notes>This index enables the application to find all of the sequences in
285 :     a contig in order, and makes it easier to find a particular residue section.</Notes>
286 :     <IndexFields>
287 :     <IndexField name="start-position" order="ascending" />
288 :     <IndexField name="len" order="ascending" />
289 :     </IndexFields>
290 :     </FromIndex>
291 :     </Relationship>
292 :     <Relationship name="IsTargetOfAnnotation" from="Feature" to="Annotation" arity="1M">
293 :     <Notes>This relationship connects a feature to its annotations.</Notes>
294 :     </Relationship>
295 :     <Relationship name="MadeAnnotation" from="SproutUser" to="Annotation" arity="1M">
296 :     <Notes>This relationship connects an annotation to the user who made it.</Notes>
297 :     </Relationship>
298 :     <Relationship name="ParticipatesIn" from="Genome" to="Subsystem" arity="MM">
299 :     <Notes>This relationship connects subsystems to the genomes that use
300 :     it. If the subsystem has been curated for the genome, then the subsystem's roles will also be
301 :     connected to the genome features through the [b]SSCell[/b] object.</Notes>
302 :     </Relationship>
303 :     <Relationship name="OccursInSubsystem" from="Role" to="Subsystem" arity="MM">
304 :     <Notes>This relationship connects roles to the subsystems that implement them. </Notes>
305 :     </Relationship>
306 :     <Relationship name="IsGenomeOf" from="Genome" to="SSCell" arity="1M">
307 :     <Notes>This relationship connects a subsystem's spreadsheet cell to the
308 :     genome for the spreadsheet column.</Notes>
309 :     </Relationship>
310 :     <Relationship name="IsRoleOf" from="Role" to="SSCell" arity="1M">
311 :     <Notes>This relationship connects a subsystem's spreadsheet cell to the
312 :     role for the spreadsheet row.</Notes>
313 :     </Relationship>
314 :     <Relationship name="ContainsFeature" from="SSCell" to="Feature" arity="MM">
315 :     <Notes>This relationship connects a subsystem's spreadsheet cell to the
316 :     features assigned to it.</Notes>
317 :     </Relationship>
318 :     <Relationship name="IsLocatedIn" from="Feature" to="Contig" arity="MM">
319 :     <Notes>This relationship connects a feature to the contig segments that work together
320 :     to effect it. The segments are numbered sequentially starting from 1. The database is
321 :     required to place an upper limit on the length of each segment. If a segment is longer
322 :     than the maximum, it can be broken into smaller bits.
323 :     [p]The upper limit enables applications to locate all features that contain a specific
324 :     residue. For example, if the upper limit is 100 and we are looking for a feature that
325 :     contains residue 234 of contig [b]ABC[/b], we can look for features with a begin point
326 :     between 135 and 333. The results can then be filtered by direction and length of the
327 :     segment.</Notes>
328 :     <Fields>
329 :     <Field name="locN" type="int">
330 :     <Notes>Sequence number of this segment.</Notes>
331 :     </Field>
332 :     <Field name="beg" type="int">
333 :     <Notes>Index (1-based) of the first residue in the contig that
334 :     belongs to the segment.</Notes>
335 :     </Field>
336 :     <Field name="len" type="int">
337 :     <Notes>Number of residues in the segment. A length of 0 identifies
338 :     a specific point between residues. This is the point before the residue if the direction
339 :     is forward and the point after the residue if the direction is backward.</Notes>
340 :     </Field>
341 :     <Field name="dir" type="char">
342 :     <Notes>Direction of the segment: [b]+[/b] if it is forward and
343 :     [b]-[/b] if it is backward.</Notes>
344 :     </Field>
345 :     </Fields>
346 :     <FromIndex Unique="false">
347 :     <Notes>This index allows the application to find all the segments of a feature in
348 :     the proper order.</Notes>
349 :     <IndexFields>
350 :     <IndexField name="locN" order="ascending" />
351 :     </IndexFields>
352 :     </FromIndex>
353 :     <ToIndex>
354 :     <Notes>This index is the one used by applications to find all the feature
355 :     segments that contain a specific residue.</Notes>
356 :     <IndexFields>
357 :     <IndexField name="beg" order="ascending" />
358 :     </IndexFields>
359 :     </ToIndex>
360 :     </Relationship>
361 :     <Relationship name="IsClusteredOnChromosomeWith" from="Feature" to="Feature" arity="MM">
362 :     <Notes>This relationship is one of two that relate features to each other. It connects
363 :     features that are physically close to each other on a single chromosome.</Notes>
364 :     <Fields>
365 :     <Field name="score" type="int">
366 :     <Notes>The number of co-occurrences in genomes that are not
367 :     extremely closely-related.</Notes>
368 :     </Field>
369 :     </Fields>
370 :     </Relationship>
371 :     <Relationship name="IsBidirectionalBestHitOf" from="Feature" to="Feature" arity="MM">
372 :     <Notes>This relationship is one of two that relate features to each other. It
373 :     connects features that are very similar but on separate genomes. A
374 :     bidirectional best hit relationship exists between two features [b]A[/b]
375 :     and [b]B[/b] if [b]A[/b] is the best match for [b]B[/b] on [b]A[/b]'s genome
376 :     and [b]B[/b] is the best match for [b]A[/b] on [b]B[/b]'s genome. </Notes>
377 :     <Fields>
378 :     <Field name="genome" type="name-string">
379 :     <Notes>ID of the genome containing the target (to) feature.</Notes>
380 :     </Field>
381 :     <Field name="sc" type="float">
382 :     <Notes>score for this relationship</Notes>
383 :     </Field>
384 :     </Fields>
385 :     <FromIndex>
386 :     <Notes>This index allows the application to find a feature's best hit for
387 :     a specific target genome.</Notes>
388 :     <IndexFields>
389 :     <IndexField name="genome" order="ascending" />
390 :     </IndexFields>
391 :     </FromIndex>
392 :     </Relationship>
393 :     <Relationship name="HasProperty" from="Feature" to="Property" arity="MM">
394 :     <Notes>This relationship connects a feature to its known property values.
395 :     The relationship contains text data that indicates the paper or organization
396 :     that discovered evidence that the feature possesses the property. So, for
397 :     example, if two papers presented evidence that a feature is essential,
398 :     there would be an instance of this relationship for both.</Notes>
399 :     <Fields>
400 :     <Field name="evidence" type="text">
401 :     <Notes>URL or citation of the paper or
402 :     institution that reported evidence of the relevant feature possessing
403 :     the specified property value.</Notes>
404 :     </Field>
405 :     </Fields>
406 :     </Relationship>
407 :     <Relationship name="RoleOccursIn" from="Role" to="Diagram" arity="MM">
408 :     <Notes>This relationship connects a role to the diagrams on which it
409 :     appears. A role frequently identifies an enzyme, and can appear in many
410 :     diagrams. A diagram generally contains many different roles.</Notes>
411 :     </Relationship>
412 :     <Relationship name="HasSSCell" from="Subsystem" to="SSCell" arity="1M">
413 :     <Notes>This relationship connects a subsystem to the spreadsheet cells
414 :     used to analyze and display it. The cells themselves can be thought of
415 :     as a grid with Roles on one axis and Genomes on the other. The
416 :     various features of the subsystem are then assigned to the cells.</Notes>
417 :     </Relationship>
418 :     <Relationship name="IsTrustedBy" from="SproutUser" to="SproutUser" arity="MM">
419 :     <Notes>This relationship identifies the users trusted by each
420 :     particular user. When viewing functional assignments, the
421 :     assignment displayed is the most recent one by a user trusted
422 :     by the current user. The current user implicitly trusts himself.
423 :     If no trusted users are specified in the database, the user
424 :     also implicitly trusts the user [b]FIG[/b].</Notes>
425 :     </Relationship>
426 :     </Relationships>
427 :     </Database>

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3