[Bio] / Sprout / Sapling.pm Repository:
ViewVC logotype

Annotation of /Sprout/Sapling.pm

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.39 - (view) (download) (as text)

1 : parrello 1.1 #!/usr/bin/perl -w
2 :    
3 :     #
4 :     # Copyright (c) 2003-2006 University of Chicago and Fellowship
5 :     # for Interpretations of Genomes. All Rights Reserved.
6 :     #
7 :     # This file is part of the SEED Toolkit.
8 :     #
9 :     # The SEED Toolkit is free software. You can redistribute
10 :     # it and/or modify it under the terms of the SEED Toolkit
11 :     # Public License.
12 :     #
13 :     # You should have received a copy of the SEED Toolkit Public License
14 :     # along with this program; if not write to the University of Chicago
15 :     # at info@ci.uchicago.edu or the Fellowship for Interpretation of
16 :     # Genomes at veronika@thefig.info or download a copy from
17 :     # http://www.theseed.org/LICENSE.TXT.
18 :     #
19 :    
20 :     package Sapling;
21 :    
22 :     use strict;
23 :     use Tracer;
24 : parrello 1.11 use base qw(ERDB);
25 : parrello 1.1 use Stats;
26 : parrello 1.9 use DBKernel;
27 : parrello 1.11 use SeedUtils;
28 :     use BasicLocation;
29 : parrello 1.1 use XML::Simple;
30 : parrello 1.27 use Digest::MD5;
31 : parrello 1.1
32 :     =head1 Sapling Package
33 :    
34 :     Sapling Database Access Methods
35 :    
36 :     =head2 Introduction
37 :    
38 : parrello 1.11 The Sapling database is a new Entity-Relationship Database that attempts to
39 :     encapsulate our data in a portable form for distribution. It is loaded directly
40 :     from the genomes and subsystems of the SEED. This object has minimal
41 :     capabilities: most of its power comes the L<ERDB> base class.
42 : parrello 1.1
43 :     The fields in this object are as follows.
44 :    
45 :     =over 4
46 :    
47 :     =item loadDirectory
48 :    
49 :     Name of the directory containing the files used by the loaders.
50 :    
51 :     =item loaderSource
52 :    
53 : parrello 1.11 Source object for the loaders (a L<FIG> in our case).
54 : parrello 1.1
55 :     =item genomeHash
56 :    
57 :     Reference to a hash of the genomes to include when loading.
58 :    
59 :     =item subHash
60 :    
61 :     Reference to a hash of the subsystems to include when loading.
62 :    
63 :     =item tuning
64 :    
65 :     Reference to a hash of tuning parameters.
66 :    
67 : parrello 1.16 =item otuHash
68 :    
69 :     Reference to a hash that maps genome IDs to genome set names.
70 :    
71 : parrello 1.1 =back
72 :    
73 : parrello 1.8 =head2 Configuration and Construction
74 : parrello 1.1
75 :     The default loading profile for the Sapling database is to include all complete
76 :     genomes and all usable subsystems. This can be overridden by specifying a list of
77 :     genomes and subsystems in an XML configuration file. The file name should be
78 :     C<SaplingConfig.xml> in the specified data directory. The document element should
79 :     be C<Sapling>, and it has two sub-elements. The C<Genomes> element should contain as
80 :     its text a space-delimited list of genome IDs. The <Subsystems> element should contain
81 :     a list of subsystem names, one per line. If a particular section is missing, the
82 :     default list will be used.
83 :    
84 :     =head3 Example
85 :    
86 :     The following configuration file specifies 10 genomes and 6 subsystems.
87 :    
88 :     <Sapling>
89 :     <Genomes>
90 :     100226.1 31033.3 31964.1 36873.1 126740.4
91 :     155864.1 349307.7 350058.5 351348.5 412694.5
92 :     </Genomes>
93 :     <Subsystems>
94 :     Sugar_utilization_in_Thermotogales
95 :     Coenzyme_F420_hydrogenase
96 :     Ribosome_activity_modulation
97 :     prophage_tails
98 :     CBSS-393130.3.peg.794
99 :     Apigenin_derivatives
100 :     </Subsystems>
101 :     </Sapling>
102 :    
103 :     The XML file also contains tuning parameters that affect the way the data
104 :     is loaded. These are specified as attributes in the TuningParameters element,
105 :     as follows.
106 :    
107 :     =over 4
108 :    
109 :     =item maxLocationLength
110 :    
111 :     The maximum number of base pairs allowed in a single location. B<IsLocatedIn>
112 :     records are split into sections based on this length, so when you are looking
113 :     for all the features in a particular neighborhood, you can look for locations
114 :     within the maximum location distance from the neighborhood, and even if you have
115 :     a huge operon that contains tens of thousands of base pairs, you'll still be
116 :     able to find it.
117 :    
118 : parrello 1.4 =item maxSequenceLength
119 :    
120 :     The maximum number of base pairs allowed in a single DNA sequence. DNA sequences
121 :     are broken into segments to prevent excessively large genomes from clogging
122 :     memory during sequence resolution.
123 :    
124 : parrello 1.1 =back
125 :    
126 :     =head3 Global Section Constant
127 :    
128 :     Each section of the database used by the loader corresponds to a single genome.
129 :     The global section is loaded after all the others, and is concerned with data
130 :     not related to a particular genome.
131 :    
132 :     =cut
133 :    
134 :     # Name of the global section
135 :     use constant GLOBAL => 'Globals';
136 :    
137 :     =head3 Tuning Parameter Defaults
138 :    
139 :     Each tuning parameter must have a default value, in case it is not present in
140 :     the XML configuration file. The defaults are specified in a constant hash
141 :     reference called C<TUNING_DEFAULTS>.
142 :    
143 :     =cut
144 :    
145 :     use constant TUNING_DEFAULTS => {
146 : parrello 1.4 maxLocationLength => 4000,
147 : parrello 1.27 maxSequenceLength => 10000,
148 : parrello 1.1 };
149 :    
150 :     =head3 new
151 :    
152 :     my $sap = Sapling->new(%options);
153 :    
154 :     Construct a new Sapling object. The following options are supported.
155 :    
156 :     =over 4
157 :    
158 :     =item loadDirectory
159 :    
160 :     Data directory to be used by the loaders.
161 :    
162 : parrello 1.7 =item DBD
163 : parrello 1.1
164 :     XML database definition file.
165 :    
166 :     =item dbName
167 :    
168 :     Name of the database to use.
169 :    
170 :     =item sock
171 :    
172 :     Socket for accessing the database.
173 :    
174 :     =item userData
175 :    
176 :     Name and password used to log on to the database, separated by a slash.
177 :    
178 :     =item dbhost
179 :    
180 :     Database host name.
181 :    
182 : parrello 1.25 =item port
183 :    
184 : parrello 1.35 MYSQL port number to use (MySQL only).
185 :    
186 :     =item dbms
187 :    
188 :     Database management system to use (e.g. C<SQLite> or C<postgres>, default C<mysql>).
189 : parrello 1.25
190 : parrello 1.1 =back
191 :    
192 :     =cut
193 :    
194 :     sub new {
195 :     # Get the parameters.
196 :     my ($class, %options) = @_;
197 :     # Get the options.
198 : parrello 1.5 my $loadDirectory = $options{loadDirectory} || $FIG_Config::saplingData ||
199 :     "$FIG_Config::fig/SaplingData";
200 : parrello 1.7 my $dbd = $options{DBD} || "$loadDirectory/SaplingDBD.xml";
201 : olson 1.6 my $dbName = $options{dbName} || $FIG_Config::saplingDB || "nmpdr_sapling";
202 : parrello 1.3 my $userData = $options{userData} || "seed/";
203 :     my $dbhost = $options{dbhost} || $FIG_Config::saplingHost || "localhost";
204 : parrello 1.25 my $port = $options{port} || 3306;
205 : parrello 1.35 my $dbms = $options{dbms} || 'mysql';
206 : parrello 1.25 # Compute the socket. An empty string is a valid override here.
207 :     my $sock = $options{sock};
208 :     if (! defined $sock) {
209 :     $sock = $FIG_Config::sproutSock || "";
210 :     }
211 : parrello 1.1 # Compute the user name and password.
212 :     my ($user, $pass) = split '/', $userData, 2;
213 :     $pass = "" if ! defined $pass;
214 : parrello 1.24 Trace("Connecting to sapling database.") if T(2);
215 : parrello 1.1 # Connect to the database.
216 : parrello 1.35 my $dbh = DBKernel->new($dbms, $dbName, $user, $pass, $port, $dbhost, $sock);
217 : parrello 1.1 # Create the ERDB object.
218 : parrello 1.2 my $retVal = ERDB::new($class, $dbh, $dbd, %options);
219 : parrello 1.1 # Add the load directory pointer.
220 :     $retVal->{loadDirectory} = $loadDirectory;
221 :     # Set up the spaces for the loader source object, the subsystem hash, the
222 :     # genome hash, and the tuning parameters.
223 :     $retVal->{source} = undef;
224 :     $retVal->{genomeHash} = undef;
225 :     $retVal->{subHash} = undef;
226 :     $retVal->{tuning} = undef;
227 : parrello 1.16 # Set up the hash of genome IDs to OTUs.
228 :     $retVal->{otuHash} = {};
229 : parrello 1.1 # Return it.
230 :     return $retVal;
231 :     }
232 :    
233 : parrello 1.8 =head2 Public Methods
234 :    
235 : parrello 1.16 =head3 OTU
236 :    
237 :     my $otu = $sap->OTU($genomeID);
238 :    
239 :     Return the name of the Organism Taxonomic Unit (GenomeSet) for the
240 :     specified genome ID. OTU information is cached in memory, so that once it
241 :     is known, it does not need to be re-fetched from the database.
242 :    
243 :     =over 4
244 :    
245 :     =item genomeID
246 :    
247 :     ID of a genome or feature. If a feature ID is specified, the genome ID will be
248 :     extracted from it.
249 :    
250 :     =item RETURN
251 :    
252 :     Returns the name of the genome set for the specified genome, or C<undef> if the
253 :     genome is not in the
254 :    
255 :     =back
256 :    
257 :     =cut
258 :    
259 :     sub OTU {
260 :     # Get the parameters.
261 :     my ($self, $genomeID) = @_;
262 :     # Get the OTU hash.
263 :     my $otuHash = $self->{otuHash};
264 :     # Compute the real genome ID.
265 :     my $realGenomeID = ($genomeID =~ /^fig\|(\d+\.\d+)/ ? $1 : $genomeID);
266 :     # Look it up in the hash.
267 :     my $retVal = $otuHash->{$realGenomeID};
268 :     # Was it found?
269 :     if (! defined $retVal) {
270 :     # No, get the OTU from the database.
271 :     ($retVal) = $self->GetFlat("IsCollectedInto", "IsCollectedInto(from-link) = ?",
272 :     [$realGenomeID], "to-link");
273 :     # Save it in the hash for future use.
274 :     $otuHash->{$realGenomeID} = $retVal;
275 :     }
276 :     # Return the result.
277 :     return $retVal;
278 :     }
279 :    
280 : parrello 1.27 =head3 ProteinID
281 :    
282 :     my $key = $sap->ProteinID($sequence);
283 :    
284 :     Return the protein sequence ID that would be associated with a specific
285 :     protein sequence.
286 :    
287 :     =over 4
288 :    
289 :     =item sequence
290 :    
291 :     String containing the protein sequence in question.
292 :    
293 :     =item RETURN
294 :    
295 :     Returns the ID value for the specified protein sequence. If the sequence exists
296 :     in the database, it will have this ID in the B<ProteinSequence> table.
297 :    
298 :     =back
299 :    
300 :     =cut
301 :    
302 :     sub ProteinID {
303 :     # Get the parameters.
304 :     my ($self, $sequence) = @_;
305 :     # Compute the MD5 hash.
306 :     my $retVal = Digest::MD5::md5_hex($sequence);
307 :     # Return the result.
308 :     return $retVal;
309 :     }
310 :    
311 :    
312 : parrello 1.28 =head3 IsProteinID
313 :    
314 :     my $md5 = $sap->IsProteinID($identifier);
315 :    
316 :     Check for a protein identifier. If a protein identifier is found, the
317 :     corresponding protein sequence ID will be returned; otherwise, an
318 :     undefined value will be returned. A protein identifier is either a
319 :     raw protein sequence ID, an ID preceded by C<md5|>, or an ID preceded by
320 :     C<gnl|md5|>
321 :    
322 :     =over 4
323 :    
324 :     =item identifier
325 :    
326 :     Identifier to test.
327 :    
328 :     =item RETURN
329 :    
330 :     Returns the MD5 code from the protein identifier, or C<undef> if the incoming
331 :     string is not a protein identifier.
332 :    
333 :     =back
334 :    
335 :     =cut
336 :    
337 :     sub IsProteinID {
338 :     # Get the parameters.
339 :     my ($self, $identifier) = @_;
340 :     # Declare the return variable.
341 :     my $retVal;
342 :     # Check the input.
343 :     if ($identifier =~ /^(?:gnl\|)?(?:md5\|)?([0-9a-f]{32})$/) {
344 :     $retVal = $1;
345 :     }
346 :     # Return the result.
347 :     return $retVal;
348 :     }
349 :    
350 :    
351 : parrello 1.18 =head3 Assignment
352 :    
353 :     my $assignment = $sapling->Assignment($fid);
354 :    
355 :     Return the functional assignment for the specified feature.
356 :    
357 :     =over 4
358 :    
359 :     =item fid
360 :    
361 :     FIG ID of the desired feature.
362 :    
363 :     =item RETURN
364 :    
365 :     Returns the functional assignment of the specified feature, or C<undef>
366 :     if the feature does not exist.
367 :    
368 :     =back
369 :    
370 :     =cut
371 :    
372 :     sub Assignment {
373 :     # Get the parameters.
374 :     my ($self, $fid) = @_;
375 :     # Get the functional assignment.
376 :     my ($retVal) = $self->GetFlat("Feature", "Feature(id) = ?", [$fid], 'function');
377 :     # Return the result.
378 :     return $retVal;
379 :     }
380 :    
381 : parrello 1.27 =head3 IdsForProtein
382 :    
383 :     my @ids = $sap->IdsForProtein($protID);
384 :    
385 :     Return a list of all the identifiers associated with the specified
386 :     protein.
387 :    
388 :     =over 4
389 :    
390 :     =item protID
391 :    
392 :     ID of the protein of interest.
393 :    
394 :     =item RETURN
395 :    
396 :     Returns a list of the Identifiers for the specific protein or for genes that
397 :     produce the specific protein.
398 :    
399 :     =back
400 :    
401 :     =cut
402 :    
403 :     sub IdsForProtein {
404 :     # Get the parameters.
405 :     my ($self, $protID) = @_;
406 :     # We'll put the identifiers found in here.
407 :     my %retVal;
408 :     # Ask for identifiers that directly name the protein.
409 :     for my $id ($self->GetFlat("ProteinSequence IsNamedBy Identifier",
410 :     "ProteinSequence(id) = ?", [$protID],
411 :     'Identifier(id)')) {
412 :     $retVal{$id} = 1;
413 :     }
414 :     # Add identifiers that name genes producing the protein.
415 :     for my $id ($self->GetFlat("ProteinSequence IsProteinFor Feature IsIdentifiedBy Identifier",
416 :     "ProteinSequence(id) = ?", [$protID],
417 :     'Identifier(id)')) {
418 :     $retVal{$id} = 1;
419 :     }
420 :     # Return the results found.
421 :     return sort keys %retVal;
422 :     }
423 :    
424 : parrello 1.16
425 : parrello 1.11 =head3 ComputeDNA
426 :    
427 :     my $dna = $sap->ComputeDNA($location);
428 :    
429 :     Return the DNA sequence for the specified location.
430 :    
431 :     =over 4
432 :    
433 :     =item location
434 :    
435 :     A L<BasicLocation> object indicating the contig, start location, direction, and
436 :     length of the desired DNA segment.
437 :    
438 :     =item RETURN
439 :    
440 : parrello 1.13 Returns a string containing the desired DNA. The DNA comes back in pure lower-case.
441 : parrello 1.11
442 :     =back
443 :    
444 :     =cut
445 :    
446 :     sub ComputeDNA {
447 :     # Get the parameters.
448 :     my ($self, $location) = @_;
449 : parrello 1.20 # Get the contig, left end, and right end of the location. Note we subtract
450 :     # 1 to convert contig positions to string offsets.
451 : parrello 1.11 my $contig = $location->Contig;
452 : parrello 1.20 my $left = $location->Left - 1;
453 :     my $right = $location->Right - 1;
454 : parrello 1.31 # Insure the left location is valid.
455 :     if ($left < 0) {
456 :     $left = 0;
457 :     }
458 : parrello 1.11 # Get the DNA segment length.
459 :     my $maxSequenceLength = $self->TuningParameter("maxSequenceLength");
460 :     # Compute the key of the first segment of our DNA and the starting
461 :     # point in that segment.
462 :     my $leftOffset = $left % $maxSequenceLength;
463 :     my $leftKey = "$contig:" . Tracer::Pad(($left - $leftOffset)/$maxSequenceLength,
464 :     7, 1, '0');
465 :     # Compute the key of the last segment containing our DNA.
466 :     my $rightKey = "$contig:" . Tracer::Pad(int($right/$maxSequenceLength), 7, 1, '0');
467 :     my @results = $self->GetFlat("DNASequence",
468 : parrello 1.26 'DNASequence(id) >= ? AND DNASequence(id) <= ?',
469 : parrello 1.11 [$leftKey, $rightKey], 'sequence');
470 :     # Form all the DNA into a string and extract our piece.
471 :     my $retVal = substr(join("", @results), $leftOffset, $location->Length);
472 :     # If this is a backwards string, we need the reverse complement.
473 :     rev_comp(\$retVal) if $location->Dir eq '-';
474 :     # Return the result.
475 :     return $retVal;
476 :     }
477 :    
478 : parrello 1.12 =head3 FilterByGenome
479 :    
480 :     my @filteredFids = $sapling->FilterByGenome(\@fids, $genomeFilter);
481 :    
482 :     Filter the features using the specified genome-based criterion. The
483 :     criterion can be either a comma-separated list of genome IDs, or a
484 :     partial organism name.
485 :    
486 :     =over 4
487 :    
488 :     =item fids
489 :    
490 :     Reference to a list of feature IDs.
491 :    
492 :     =item genomeFilter
493 :    
494 :     A string specifying the filtering criterion. If undefined or blank, then
495 :     no filter is applied. If a name, then only features from genomes with a
496 :     matching name will be returned. A name is a match if the filter is an
497 :     exact match for some prefix of the organism name. Thus, C<Listeria> would
498 :     get all Listerias, while C<Listeria monocytogenes EGD-e> would match only
499 :     the specific EGD-e strain. For a more precise match, you can specify
500 :     instead a comma-delimited list of genome IDs. In this latter case, only
501 :     features for the listed genomes will be included in the results.
502 :    
503 :     =item RETURN
504 :    
505 :     Returns the features from the incoming list that match the filter condition.
506 :    
507 :     =back
508 :    
509 :     =cut
510 :    
511 :     sub FilterByGenome {
512 :     # Get the parameters.
513 :     my ($self, $fids, $genomeFilter) = @_;
514 :     # Declare the return variable.
515 :     my @retVal;
516 :     # Check the type of filter.
517 :     if (! $genomeFilter) {
518 :     # No filter, so copy the input directly to the result.
519 :     @retVal = @$fids;
520 :     } else {
521 :     # Trim edge spaces from the filter.
522 :     $genomeFilter = Tracer::Trim($genomeFilter);
523 :     # This hash will contain the permissible genome IDs.
524 :     my %genomeIDs;
525 :     # Check for a name. We assume we have a name if there's an
526 :     # alphabetic letter anywhere in the filter string.
527 :     if ($genomeFilter =~ /[a-zA-Z]/) {
528 :     # The filter contains something that does not look like a genome
529 :     # ID, so it is treated as a genome name. We get the IDs of the
530 :     # genomes with that name and put them in the hash.
531 :     %genomeIDs = map { $_ => 1 } $self->GetFlat("Genome",
532 :     'Genome(scientific-name) LIKE ?',
533 :     ["$genomeFilter%"], 'id');
534 :     } else {
535 :     # We are expecting a comma-delimited list of genome IDs, so we
536 :     # put these in our hash.
537 :     %genomeIDs = map { $_ => 1 } split(/\s*,\s*/, $genomeFilter);
538 :     }
539 :     # Now we loop through the features, keeping the ones whose genome ID
540 :     # matches something in the hash.
541 :     @retVal = grep { $genomeIDs{genome_of($_)} } @$fids;
542 :     }
543 :     # Return the result.
544 :     return @retVal;
545 :     }
546 :    
547 :    
548 : parrello 1.11 =head3 GetLocations
549 :    
550 :     my @locs = $sapling->GetLocations($fid);
551 :    
552 :     Return the locations of the DNA for the specified feature.
553 :    
554 :     =over 4
555 :    
556 :     =item fid
557 :    
558 :     ID of the feature whose location is desired.
559 :    
560 :     =item RETURN
561 :    
562 :     Returns a list of L<BasicLocation> objects for the locations containing the
563 :     feature's DNA.
564 :    
565 :     =back
566 :    
567 :     =cut
568 :    
569 :     sub GetLocations {
570 :     # Get the parameters.
571 :     my ($self, $fid) = @_;
572 :     # Declare the return variable.
573 :     my @retVal;
574 : parrello 1.23 # This will contain the last location found.
575 :     my $lastLoc;
576 : parrello 1.11 # Get this feature's locations.
577 :     my $qh = $self->Get("IsLocatedIn",
578 :     'IsLocatedIn(from-link) = ? ORDER BY IsLocatedIn(ordinal)',
579 :     [$fid]);
580 :     while (my $resultRow = $qh->Fetch()) {
581 :     # Compute the contig ID and other information.
582 :     my $contig = $resultRow->PrimaryValue('to-link');
583 :     my $begin = $resultRow->PrimaryValue('begin');
584 :     my $dir = $resultRow->PrimaryValue('dir');
585 :     my $len = $resultRow->PrimaryValue('len');
586 :     # Create a location from the location information.
587 :     my $start = ($dir eq '+' ? $begin : $begin + $len - 1);
588 : parrello 1.23 my $loc = BasicLocation->new($contig, $start, $dir, $len);
589 :     # Check to see if this location is adjacent to the previous one.
590 :     if ($lastLoc && $lastLoc->Adjacent($loc)) {
591 :     # It is, so merge it in.
592 :     $lastLoc->Merge($loc);
593 :     } else {
594 :     # It isn't, so push the new one on the list.
595 :     $lastLoc = $loc;
596 :     push @retVal, $loc;
597 :     }
598 : parrello 1.11 }
599 :     # Return the result.
600 :     return @retVal;
601 :     }
602 :    
603 :    
604 : parrello 1.8 =head3 IdentifiedProtein
605 :    
606 :     my $proteinID = $sap->IdentifiedProtein($id);
607 :    
608 :     Compute the protein for a specified identifier. If the identifier does
609 :     not exist or does not identify a protein, this method will return
610 :     C<undef>.
611 :    
612 :     =over 4
613 :    
614 :     =item id
615 :    
616 :     Identifier whose protein is desired.
617 :    
618 :     =item RETURN
619 :    
620 :     Returns the protein ID corresponding to the incoming identifier,
621 :     or C<undef> if the identifier does not exist or is not for a protein.
622 :    
623 :     =back
624 :    
625 :     =cut
626 :    
627 :     sub IdentifiedProtein {
628 :     # Get the parameters.
629 :     my ($self, $id) = @_;
630 :     # Declare the return variable.
631 :     my $retVal;
632 :     # Try to find a protein for this ID.
633 :     my ($proteinID) = $self->GetFlat("Identifier Names ProteinSequence",
634 :     "Identifier(id) = ?", [$id],
635 :     'ProteinSequence(id)');
636 :     if (defined $proteinID) {
637 :     # We found one, so we're done.
638 :     $retVal = $proteinID;
639 :     } else {
640 :     # Not a protein ID. See if it's the ID of a feature that has a
641 :     # protein connected. Note that it's possible to find more than one,
642 :     # but we're going to punt and pick the first.
643 :     ($proteinID) = $self->GetFlat("Identifier Identifies Feature Produces ProteinSequence",
644 :     "Identifier(id) = ? LIMIT 1", [$id],
645 :     'ProteinSequence(id)');
646 :     if (defined $proteinID) {
647 :     # We found a protein ID, so return it.
648 :     $retVal = $proteinID;
649 :     }
650 :     }
651 :     # Return the result.
652 :     return $retVal;
653 :     }
654 :    
655 : parrello 1.9 =head3 FeaturesByID
656 :    
657 :     my @fids = $sapling->FeaturesByID($id);
658 :    
659 :     Return all the features corresponding to the specified identifier. Only features
660 :     that represent the same locus will be returned.
661 :    
662 :     =over 4
663 :    
664 :     =item id
665 :    
666 :     Identifier of interest.
667 :    
668 :     =item RETURN
669 :    
670 :     Returns a list of all the features in the database that match the given
671 :     identifier.
672 :    
673 :     =back
674 :    
675 :     =cut
676 :    
677 :     sub FeaturesByID {
678 :     # Get the parameters.
679 :     my ($self, $id) = @_;
680 :     # Ask for features from the database.
681 :     my @retVal = $self->GetFlat("Identifies", "Identifies(from-link) = ?", [$id],
682 :     'to-link');
683 :     # Return the result.
684 :     return @retVal;
685 :     }
686 :    
687 :     =head3 ProteinsByID
688 :    
689 :     my @fids = $sapling->ProteinsByID($id);
690 :    
691 :     Return all the features that have the same protein sequence as the
692 : parrello 1.10 identified feature. The returned features mar or may not have the same locus. If
693 :     the identifier is not for a protein encoding gene, no result will be returned.
694 : parrello 1.9
695 :     =over 4
696 :    
697 :     =item id
698 :    
699 : parrello 1.10 Identifier of interest. This can be any alias identifier from the B<Identifier>
700 :     table (which includes the FIG ID).
701 : parrello 1.9
702 :     =item RETURN
703 :    
704 : parrello 1.10 Returns a list of FIG IDs for features having the same protein sequence. If the
705 :     identifier does not specify a protein-encoding gene, the list will be empty.
706 : parrello 1.9
707 :     =back
708 :    
709 :     =cut
710 :    
711 :     sub ProteinsByID {
712 :     # Get the parameters.
713 :     my ($self, $id) = @_;
714 :     # Declare the return variable.
715 :     my @retVal;
716 : parrello 1.10 # Compute the protein for this identifier.
717 :     my $protID = $self->IdentifiedProtein($id);
718 :     # Only proceed if a protein was found. If no protein was found, we're
719 :     # already set up to return an empty list.
720 :     if (defined $protID) {
721 :     # Get all the features connected to the identified protein.
722 :     @retVal = $self->GetFlat("IsProteinFor", "IsProteinFor(from-link) = ?",
723 :     [$protID], "IsProteinFor(to-link)");
724 :     }
725 : parrello 1.9 # Return the result.
726 :     return @retVal;
727 :     }
728 :    
729 :     =head3 GetSubsystem
730 :    
731 :     my $ssData = $sapling->GetSubsystem($ssName);
732 :    
733 : parrello 1.11 Return a L<SaplingSubsys> object for the named subsystem.
734 : parrello 1.9
735 :     =over 4
736 : parrello 1.8
737 : parrello 1.9 =item ssName
738 :    
739 :     Name of the desired subsystem.
740 :    
741 :     =item RETURN
742 :    
743 :     Returns an object that defines multiple useful methods for manipulating the
744 :     named subsystem.
745 :    
746 :     =back
747 :    
748 :     =cut
749 :    
750 :     sub GetSubsystem {
751 :     # Get the parameters.
752 :     my ($self, $ssName) = @_;
753 :     # Declare the return variable.
754 :     require SaplingSubsys;
755 :     my $retVal = SaplingSubsys->new($ssName, $self);
756 :     # Return the result.
757 :     return $retVal;
758 :     }
759 : parrello 1.8
760 :    
761 :     =head3 GenesInRegion
762 :    
763 :     my @pegs = $sap->GenesInRegion($location);
764 :    
765 :     Return a list of the IDs for the features that overlap the specified
766 :     region on a contig.
767 :    
768 :     =over 4
769 :    
770 :     =item location
771 :    
772 :     Location of interest, either in the form of a location string (e.g.
773 : parrello 1.11 C<360108.3:NZ_AANK01000002_264528_264007>) or a L<BasicLocation>
774 : parrello 1.8 object.
775 :    
776 :     =item RETURN
777 :    
778 :     Returns a list of feature IDs. The features in the list will be all
779 :     those that overlap or occur inside the location of interest.
780 :    
781 :     =back
782 :    
783 :     =cut
784 :    
785 :     sub GenesInRegion {
786 :     # Get the parameters.
787 :     my ($self, $location) = @_;
788 :     # Insure we have a location object.
789 :     my $locObject = (ref $location ? $location : BasicLocation->new($location));
790 :     # Get the beginning and the end of the location of interest.
791 :     my $begin = $locObject->Left();
792 :     my $end = $locObject->Right();
793 :     # For performance reasons, we limit the possible starting location, using the
794 :     # tuning parameter for maximum location length.
795 :     my $limit = $begin - $self->TuningParameter('maxLocationLength');
796 :     # Perform the query. Note we use a hash to eliminate duplicates.
797 :     my %retVal = map { $_ => 1 } $self->GetFlat('Contig IsLocusFor Feature',
798 :     "Contig(id) = ? AND IsLocusFor(begin) <= ? AND " .
799 :     "IsLocusFor(begin) > ? AND " .
800 :     "IsLocusFor(begin) + IsLocusFor(len) >= ?",
801 :     [$locObject->Contig(), $end, $limit, $begin],
802 :     'Feature(id)');
803 :     # Return the result.
804 :     return sort keys %retVal;
805 :     }
806 :    
807 :     =head3 GetFasta
808 :    
809 : parrello 1.9 my $fasta = $sapling->GetFasta($proteinID, $id, $comment);
810 : parrello 1.8
811 :     Return a FASTA sequence for the specified protein. An optional identifier
812 :     can be provided to be used as the identification string.
813 :    
814 :     =over 4
815 :    
816 :     =item proteinID
817 :    
818 :     Protein sequence identifier.
819 :    
820 :     =item id (optional)
821 :    
822 :     The identifier to be used in the FASTA output. If omitted, the protein ID
823 :     is used.
824 :    
825 : parrello 1.9 =item comment (optional)
826 :    
827 :     The comment string to be used in the identification line of the FASTA output.
828 :     If omitted, no comment will be present.
829 :    
830 : parrello 1.8 =item RETURN
831 :    
832 :     Returns a FASTA string for the protein. This includes the identification
833 : parrello 1.14 line and the protein letters themselves.
834 : parrello 1.8
835 :     =back
836 :    
837 :     =cut
838 :    
839 :     sub GetFasta {
840 :     # Get the parameters.
841 : parrello 1.9 my ($self, $proteinID, $id, $comment) = @_;
842 : parrello 1.11 # Compute the identifier.
843 : parrello 1.8 my $realID = $id || "md5|$proteinID";
844 :     # Declare the return variable.
845 :     my $retVal;
846 :     # Get the protein sequence.
847 :     my ($sequence) = $self->GetFlat("ProteinSequence", "ProteinSequence(id) = ?",
848 :     [$proteinID], "sequence");
849 :     # It's an error if the sequence was not found.
850 :     if (! defined $sequence) {
851 :     Confess("No protein found with the sequence identifier $proteinID.");
852 :     } else {
853 : parrello 1.11 # Create a FASTA string for the protein.
854 :     $retVal = SeedUtils::create_fasta_record($realID, $comment, $sequence);
855 : parrello 1.8 }
856 :     # Return the result.
857 :     return $retVal;
858 :     }
859 : parrello 1.1
860 :    
861 : parrello 1.4 =head3 Taxonomy
862 :    
863 : parrello 1.14 my @taxonomy = $sap->Taxonomy($genomeID, $format);
864 : parrello 1.4
865 :     Return the full taxonomy of the specified genome, starting from the
866 : parrello 1.14 domain downward.
867 : parrello 1.4
868 :     =over 4
869 :    
870 :     =item genomeID
871 :    
872 :     ID of the genome whose taxonomy is desired. The genome does not need to exist
873 :     in the database: the version number will be lopped off and the result used as
874 :     an entry point into the taxonomy tree.
875 :    
876 : parrello 1.14 =item format (optional)
877 :    
878 :     Format of the taxonomy. C<names> will return primary names, C<numbers> will
879 :     return taxonomy numbers, and C<both> will return taxonomy number followed by
880 :     primary name. The default is C<names>.
881 :    
882 : parrello 1.4 =item RETURN
883 :    
884 :     Returns a list of taxonomy names, starting from the domain and moving
885 :     down to the node where the genome is attached.
886 :    
887 :     =back
888 :    
889 :     =cut
890 :    
891 :     sub Taxonomy {
892 :     # Get the parameters.
893 : parrello 1.14 my ($self, $genomeID, $format) = @_;
894 : parrello 1.4 # Get the genome's taxonomic group.
895 :     my ($taxon) = split /\./, $genomeID, 2;
896 :     # We'll put the return data in here.
897 :     my @retVal;
898 :     # Loop until we hit a domain.
899 :     my $domainFlag;
900 :     while (! $domainFlag) {
901 :     # Get the data we need for this taxonomic group.
902 :     my ($taxonData) = $self->GetAll('TaxonomicGrouping IsInGroup',
903 :     'TaxonomicGrouping(id) = ?', [$taxon],
904 :     'domain scientific-name IsInGroup(to-link)');
905 :     # If we didn't find what we're looking for, then we have a problem. This
906 :     # would indicate a node below the domain level that doesn't have a parent
907 :     # or (more likely) an invalid input string.
908 :     if (! $taxonData) {
909 :     # Terminate the loop and trace a warning.
910 :     $domainFlag = 1;
911 :     Trace("Could not find node or parent for \"$taxon\".") if T(1);
912 :     } else {
913 :     # Extract the data for the current group. Note we overwrite our
914 :     # taxonomy ID with the ID of our parent, priming the next iteration
915 :     # of the loop.
916 :     my $name;
917 : parrello 1.14 my $oldTaxon = $taxon;
918 : parrello 1.4 ($domainFlag, $name, $taxon) = @$taxonData;
919 : parrello 1.14 # Compute the value we want to put in the output list.
920 :     my $value;
921 :     if ($format eq 'numbers') {
922 :     $value = $oldTaxon;
923 :     } elsif ($format eq 'both') {
924 :     $value = "$oldTaxon $name";
925 :     } else {
926 :     $value = $name;
927 :     }
928 :     # Put the current group's data in the return list.
929 :     unshift @retVal, $value;
930 : parrello 1.4 }
931 :     }
932 :     # Return the result.
933 :     return @retVal;
934 :     }
935 :    
936 : parrello 1.18 =head3 IsDeletedFid
937 :    
938 :     my $flag = $sapling->IsDeletedFid($fid);
939 :    
940 :     Return TRUE if the specified feature is B<not> in the database, else
941 :     FALSE.
942 :    
943 :     =over 4
944 :    
945 :     =item fid
946 :    
947 :     FIG ID of the relevant feature.
948 :    
949 :     =item RETURN
950 :    
951 :     Returns TRUE if the specified feature is in the database, else FALSE.
952 :    
953 :     =back
954 :    
955 :     =cut
956 :    
957 :     sub IsDeletedFid {
958 :     # Get the parameters.
959 :     my ($self, $fid) = @_;
960 :     # Check for the feature. If the feature does not exist, we'll get an
961 :     # undefined value (FALSE). If it does, we'll get the feature ID itself
962 :     # (TRUE).
963 :     my ($retVal) = $self->GetFlat("Feature", "Feature(id) = ?", [$fid], 'id');
964 :     # Return FALSE if the feature was found, TRUE if it was not found.
965 :     return ($retVal ? 0 : 1);
966 :     }
967 :    
968 : parrello 1.4
969 : parrello 1.1 =head3 GenomeHash
970 :    
971 :     my $genomeHash = $sap->GenomeHash();
972 :    
973 :     Return a hash of the genomes configured to be in this database. The list
974 :     is either taken from the active SEED database or from a configuration
975 :     file in the data directory. The hash maps genome IDs to TRUE.
976 :    
977 :     =cut
978 :    
979 :     sub GenomeHash {
980 :     # Get the parameters.
981 :     my ($self) = @_;
982 :     # We'll build the hash in here.
983 :     my %genomeHash;
984 :     # Do we already have a list?
985 :     if (! defined $self->{genomeHash}) {
986 :     # No, check for a configuration file.
987 :     my $xml = $self->ReadConfigFile();
988 :     if (defined $xml && $xml->{Genomes}) {
989 :     # We found one and it has a genome list, so extract the genomes.
990 :     %genomeHash = map { $_ => 1 } grep { $_ =~ /\S/ } split /\s+/, $xml->{Genomes};
991 :     } else {
992 :     # No, so get the genome list.
993 :     my $fig = $self->GetSourceObject();
994 : parrello 1.8 my @genomes = $fig->genomes();
995 : parrello 1.1 # Verify the genome list to insure every genome has an organism
996 :     # directory.
997 :     for my $genome (@genomes) {
998 :     if (-d "$FIG_Config::organisms/$genome") {
999 :     $genomeHash{$genome} = 1;
1000 :     }
1001 :     }
1002 :     }
1003 :     # Store the genomes in this object.
1004 :     $self->{genomeHash} = \%genomeHash;
1005 :     }
1006 :     # Return the result.
1007 :     return $self->{genomeHash};
1008 :     }
1009 :    
1010 :     =head3 SubsystemID
1011 :    
1012 :     my $subID = $sap->SubsystemID($subName);
1013 :    
1014 :     Return the ID of the subsystem with the specified name.
1015 :    
1016 :     =over 4
1017 :    
1018 :     =item subName
1019 :    
1020 :     Name of the relevant subsystem. A subsystem name with underscores for spaces
1021 :     will return the same ID as a subsystem name with the spaces still in it.
1022 :    
1023 :     =item RETURN
1024 :    
1025 : parrello 1.4 Returns a normalized subsystem name.
1026 : parrello 1.1
1027 :     =back
1028 :    
1029 :     =cut
1030 :    
1031 :     sub SubsystemID {
1032 :     # Get the parameters.
1033 :     my ($self, $subName) = @_;
1034 : parrello 1.4 # Normalize the subsystem name by converting underscores to spaces.
1035 : parrello 1.37 # Underscores at the end are not converted.
1036 : parrello 1.4 my $retVal = $subName;
1037 : parrello 1.38 my $trailer = chop $retVal;
1038 :     $retVal =~ tr/_/ /;
1039 :     $retVal .= $trailer;
1040 : parrello 1.1 # Return the result.
1041 :     return $retVal;
1042 :     }
1043 :    
1044 : parrello 1.14 =head3 Alias
1045 :    
1046 :     my $translatedID = $sap->Alias($fid, $source);
1047 :    
1048 :     Return an alternate ID of the specified type for the specified feature.
1049 :     If no alternate ID of that type exists, the incoming value will be
1050 :     returned unchanged.
1051 :    
1052 :     =over 4
1053 :    
1054 :     =item fid
1055 :    
1056 :     FIG ID of the feature whose alias identifier is desired.
1057 :    
1058 :     =item source
1059 :    
1060 :     Database type for the alternate ID (e.g. C<LocusTag>, C<NCBI>, C<RefSeq>). If
1061 :     C<SEED> is specified, the ID will be returned unchanged and no database lookup
1062 :     will occur.
1063 :    
1064 :     =item RETURN
1065 :    
1066 :     Returns an equivalent ID for the specified feature that belongs to the specified
1067 :     database (that is, has the specified source). If no such ID exists, returns the
1068 :     incoming ID.
1069 :    
1070 :     =back
1071 :    
1072 :     =cut
1073 :    
1074 :     sub Alias {
1075 :     # Get the parameters.
1076 :     my ($self, $fid, $source) = @_;
1077 :     # Default to the incoming value.
1078 :     my $retVal = $fid;
1079 :     # We only have work to do if the database type isn't "SEED".
1080 :     if ($source ne 'SEED') {
1081 :     # Look for an alias.
1082 :     my ($alias) = $self->GetFlat("IsIdentifiedBy Identifier",
1083 : parrello 1.15 'IsIdentifiedBy(from-link) = ? AND Identifier(source) = ?',
1084 : parrello 1.14 [$fid, $source], 'Identifier(natural-form)');
1085 :     # If we found one, return it.
1086 :     if (defined $alias) {
1087 :     $retVal = $alias;
1088 :     }
1089 :     }
1090 :     # Return the result.
1091 :     return $retVal;
1092 :     }
1093 :    
1094 :    
1095 :     =head3 ContigLength
1096 :    
1097 :     my $contigLen = $sap->ContigLength($contigID);
1098 :    
1099 :     Return the number of base pairs in the specified contig.
1100 :    
1101 :     =over 4
1102 :    
1103 :     =item contigID
1104 :    
1105 :     ID of the contig of interest.
1106 :    
1107 :     =item RETURN
1108 :    
1109 :     Returns the number of base pairs in the specified contig, or 0 if the contig
1110 :     does not exist.
1111 :    
1112 :     =back
1113 :    
1114 :     =cut
1115 :    
1116 :     sub ContigLength {
1117 :     # Get the parameters.
1118 :     my ($self, $contigID) = @_;
1119 :     # Try to find the length.
1120 :     my ($retVal) = $self->GetEntityValues(Contig => $contigID, 'length');
1121 :     # Convert not-found to 0.
1122 :     $retVal = 0 if ! defined $retVal;
1123 :     # Return the result.
1124 :     return $retVal;
1125 :     }
1126 :    
1127 :    
1128 :     =head2 Configuration-Related Methods
1129 :    
1130 : parrello 1.1 =head3 SubsystemHash
1131 :    
1132 :     my $subHash = $sap->SubsystemHash();
1133 :    
1134 :     Return a hash of the subsystems configured to be in this database. The
1135 :     list is either taken from the active SEED database or from a
1136 :     configuration file in the data directory. The hash maps subsystem names
1137 :     to TRUE.
1138 :    
1139 :     =cut
1140 :    
1141 :     sub SubsystemHash {
1142 :     # Get the parameters.
1143 :     my ($self) = @_;
1144 :     # We'll build the hash in here.
1145 :     my %subHash;
1146 :     # Do we already have a list?
1147 :     if (! defined $self->{subHash}) {
1148 :     # No, check for a configuration file.
1149 :     my $xml = $self->ReadConfigFile();
1150 :     if (defined $xml && $xml->{Subsystems}) {
1151 :     # We found one, and it has subsystems, so we extract them.
1152 :     # A little dancing is necessary to trim spaces.
1153 :     my @subs = map { $_ =~ /\s*(\S.+\S)/; $1 } split /\n/, $xml->{Subsystems};
1154 :     # Here we need to clear out any null subsystem names resulting from
1155 :     # blank lines in the file.
1156 :     %subHash = map { $_ => 1 } grep { $_ } @subs;
1157 :     } else {
1158 :     # No config file, so we ask the FIG object.
1159 :     my $fig = $self->GetSourceObject();
1160 : parrello 1.36 for my $subsystem ($fig->all_subsystems()) {
1161 :     my $subsysID = $self->SubsystemID($subsystem);
1162 : parrello 1.37 $subHash{$subsysID} = 1;
1163 :     }
1164 : parrello 1.1 }
1165 :     # Store the subsystems in this object.
1166 :     $self->{subHash} = \%subHash;
1167 :     }
1168 :     # Return the result.
1169 :     return $self->{subHash};
1170 :     }
1171 :    
1172 :     =head3 TuningParameter
1173 :    
1174 :     my $parm = $erdb->TuningParameter($parmName);
1175 :    
1176 :     Return the value of the specified tuning parameter. Tuning parameters are
1177 :     read from the XML configuration file.
1178 :    
1179 :     =over 4
1180 :    
1181 :     =item parmName
1182 :    
1183 :     Name of the parameter whose value is desired.
1184 :    
1185 :     =item RETURN
1186 :    
1187 :     Returns the paramter value.
1188 :    
1189 :     =back
1190 :    
1191 :     =cut
1192 :    
1193 :     sub TuningParameter {
1194 :     # Get the parameters.
1195 :     my ($self, $parmName) = @_;
1196 :     # Insure we have the parameters in memory.
1197 :     if (! defined $self->{tuning}) {
1198 :     # Read the configuration file.
1199 :     my $configFile = $self->ReadConfigFile();
1200 :     # Get the tuning parameters (if any).
1201 :     my $tuning;
1202 :     if (! defined $configFile || ! exists $configFile->{TuningParameters}) {
1203 :     $tuning = {};
1204 :     } else {
1205 :     $tuning = $configFile->{TuningParameters};
1206 :     }
1207 :     # Merge in the default option values.
1208 :     Tracer::MergeOptions($tuning, TUNING_DEFAULTS);
1209 :     # Save the result in our object.
1210 :     $self->{tuning} = $tuning;
1211 :     }
1212 :     # Extract the tuning paramter.
1213 :     my $retVal = $self->{tuning}{$parmName};
1214 :     # Throw an error if it does not exist.
1215 :     Confess("Invalid tuning parameter \"$parmName\".") if ! defined $retVal;
1216 :     # Return the result.
1217 :     return $retVal;
1218 :     }
1219 :    
1220 :    
1221 :     =head3 ReadConfigFile
1222 :    
1223 :     my $xmlObject = $sap->ReadConfigFile();
1224 :    
1225 :     Return the hash structure created from reading the configuration file, or
1226 :     an undefined value if the file is not found.
1227 :    
1228 :     =cut
1229 :    
1230 :     sub ReadConfigFile {
1231 :     my ($self) = @_;
1232 :     # Declare the return variable.
1233 :     my $retVal;
1234 :     # Compute the configuration file name.
1235 :     my $fileName = "$self->{loadDirectory}/SaplingConfig.xml";
1236 :     # Did we find it?
1237 :     if (-f $fileName) {
1238 :     # Yes, read it in.
1239 :     $retVal = XMLin($fileName);
1240 :     }
1241 :     # Return the result.
1242 :     return $retVal;
1243 :     }
1244 :    
1245 :     =head3 GlobalSection
1246 :    
1247 :     my $flag = $sap->GlobalSection($name);
1248 :    
1249 :     Return TRUE if the specified section name is the global section, FALSE
1250 :     otherwise.
1251 :    
1252 :     =over 4
1253 :    
1254 :     =item name
1255 :    
1256 :     Section name to test.
1257 :    
1258 :     =item RETURN
1259 :    
1260 : parrello 1.4 Returns TRUE if the parameter matches the GLOBAL constant, else FALSE.
1261 : parrello 1.1
1262 :     =back
1263 :    
1264 :     =cut
1265 :    
1266 :     sub GlobalSection {
1267 :     # Get the parameters.
1268 :     my ($self, $name) = @_;
1269 :     # Return the result.
1270 :     return ($name eq GLOBAL);
1271 :     }
1272 :    
1273 : parrello 1.35 =head3 LoadGenome
1274 :    
1275 :     my $stats = $sap->LoadGenome($genome, $directory);
1276 :    
1277 :     Load the specified genome directory into the database. The genome's DNA, features,
1278 :     protein sequences, and other supporting information will be inserted. If the
1279 :     genome already exists, numerous errors will occur; therefore, it is recommended
1280 :     that the genome be deleted first using the L<ERDB/Delete> method.
1281 :    
1282 :     =over 4
1283 :    
1284 :     =item genom
1285 :    
1286 :     The ID of the genome being loaded.
1287 :    
1288 :     =item directory
1289 :    
1290 :     Name of the genome directory.
1291 :    
1292 :     =item RETURN
1293 :    
1294 :     Returns a statistics object describing the load activity.
1295 :    
1296 :     =back
1297 :    
1298 :     =cut
1299 :    
1300 :     sub LoadGenome {
1301 :     # Get the parameters.
1302 :     my ($self, $genome, $directory) = @_;
1303 :     # Verify that the directory exists.
1304 :     Confess ("Genome directory $directory not found.") if ! -d $directory;
1305 :     # Verify that the ID is valid.
1306 :     Confess("Invalid genome ID $genome.") if $genome !~ /^\d+\.\d+$/;
1307 :     # Import the loader and call it.
1308 :     require SaplingGenomeLoader;
1309 :     my $retVal = SaplingGenomeLoader::Load($self, $genome, $directory);
1310 :     # Return the statistics object.
1311 :     return $retVal;
1312 :     }
1313 :    
1314 : parrello 1.14 =head2 Special-Purpose Methods
1315 :    
1316 :     =head3 ComputeFeatureFilter
1317 :    
1318 : parrello 1.30 my ($objects, $filter, @parms) = $sap->ComputeFeatureFilter($source, $genome);
1319 : parrello 1.14
1320 :     Compute the initial object name list, filter string, and parameter list
1321 :     for a query by feature ID. The object name list will always end with the
1322 :     B<Feature> entity, and the combination of the filter string and parameter
1323 :     list will translate the incoming ID from the specified format to a real
1324 :     FIG feature ID. If the specified format B<is> FIG feature IDs, then the
1325 :     query will start on the B<Feature> entity; otherwise, it will start with
1326 :     the B<Identifier> entity. This is a special-purpose method that performs
1327 :     the task of intelligently modifying queries to allow for external ID
1328 :     types.
1329 :    
1330 :     =over 4
1331 :    
1332 :     =item source (optional)
1333 :    
1334 :     Database source of the IDs specified-- C<SEED> for FIG IDs, C<GENE> for standard
1335 :     gene identifiers, or C<LocusTag> for locus tags. In addition, you may specify
1336 :     C<RefSeq>, C<CMR>, C<NCBI>, C<Trembl>, or C<UniProt> for IDs from those databases.
1337 :     Use C<mixed> to allow mixed ID types (though this may cause problems when the same
1338 : parrello 1.22 ID has different meanings in different databases). Use C<prefixed> to allow IDs with
1339 :     prefixing indicating the ID type (e.g. C<uni|P00934> for a UniProt ID, C<gi|135813> for
1340 :     an NCBI identifier, and so forth). The default is C<SEED>.
1341 : parrello 1.14
1342 : parrello 1.30 =item genome (optional)
1343 :    
1344 :     ID of a genome. If specified, only features from the specified genome will be
1345 :     accepted by the filter. This is important for IDs that are ambiguous between
1346 :     genomes (like Locus Tags). If omitted, no genome filtering will take place.
1347 :    
1348 : parrello 1.14 =item RETURN
1349 :    
1350 :     Returns a list containing parameters to the desired query call. The first element
1351 :     is the prefix for the object name list, the second is the prefix for the filter
1352 :     string, and the subsequent elements form the prefix for the parameter value list.
1353 :    
1354 :     =back
1355 :    
1356 :     =cut
1357 :    
1358 :     sub ComputeFeatureFilter {
1359 :     # Get the parameters.
1360 : parrello 1.30 my ($self, $source, $genome) = @_;
1361 : parrello 1.14 # Declare the return variables.
1362 :     my ($objects, $filter, @parms);
1363 : parrello 1.30 # This will be set to TRUE if we are directly processing FIG IDs.
1364 :     my $figOnly = 0;
1365 : parrello 1.14 # Determine the source type.
1366 :     if (! defined $source || $source eq 'SEED') {
1367 :     # Here we're directly processing FIG IDs.
1368 :     $objects = 'Feature';
1369 : parrello 1.30 $filter = 'Feature(id) = ?';
1370 :     $figOnly = 1;
1371 : parrello 1.14 } elsif ($source eq 'mixed') {
1372 :     # Here we're processing mixed IDs of unknown type.
1373 :     $objects = 'Identifier Identifies Feature';
1374 :     $filter = 'Identifier(natural-form) = ?';
1375 : parrello 1.22 } elsif ($source eq 'prefixed') {
1376 :     # Here we're processing mixed IDs with prefixes. This is the internal form
1377 :     # of the ID.
1378 :     $objects = 'Identifier Identifies Feature';
1379 :     $filter = 'Identifier(id) = ?';
1380 : parrello 1.14 } else {
1381 :     # Here we're processing a fixed ID type from an external database.
1382 :     # This is the case that requires an additional parameter. Note that
1383 :     # we insist that the additional parameter matches the first parameter
1384 :     # mark.
1385 :     $objects = 'Identifier Identifies Feature';
1386 :     $filter = 'Identifier(source) = ? AND Identifier(natural-form) = ?';
1387 :     push @parms, $source;
1388 :     }
1389 : parrello 1.30 # Was a genome ID specified?
1390 :     if ($genome) {
1391 :     # Yes. Add genome filtering.
1392 :     if ($figOnly) {
1393 :     # In a FIG ID situation, we can simply add the genome filtering to the front
1394 :     # of the object list.
1395 :     $objects = "Genome IsOwnerOf $objects";
1396 :     } else {
1397 :     # Otherwise, we need to do an AND thing.
1398 :     $objects = "Genome IsOwnerOf Feature AND $objects";
1399 :     }
1400 :     # Add the genome ID to the filter clause.
1401 :     $filter = "Genome(id) = ? AND $filter";
1402 :     # Add it to the parameter list.
1403 :     unshift @parms, $genome;
1404 :     }
1405 : parrello 1.14 # Return the results.
1406 :     return ($objects, $filter, @parms);
1407 :     }
1408 :    
1409 :    
1410 :     =head3 FindGapLeft
1411 :    
1412 :     my @operonData = $sap->FindGapLeft($loc, $maxGap, $interval, \%redundancyHash, \$redundancyFlag);
1413 :    
1414 :     This method performs a rather arcane task: searching for a gap to the
1415 :     left of a location in the contig. The search will proceed from the
1416 :     starting point to the left, and will stop when a gap between occupied
1417 :     locations is found that is larger than the specified maximum. The caller
1418 :     has the option of specifying a hash of feature IDs that are redundant. If
1419 :     any feature in the hash is found, the search will stop early and the
1420 :     provided redundancy flag will be set. In addition, an interval size can
1421 :     be specified to tune the process of retrieving data from the database.
1422 :    
1423 :     =over 4
1424 :    
1425 :     =item loc
1426 :    
1427 :     L<BasicLocation> object for the location from which the search is to start.
1428 :     This gives us the contig ID, the strand of interest (forward or backward),
1429 :     and the starting point of the search.
1430 :    
1431 :     =item maxGap
1432 :    
1433 :     The maximum allowable gap. The search will stop at the left end of the contig
1434 :     or the first gap larger than this amount.
1435 :    
1436 :     =item interval (optional)
1437 :    
1438 :     Interval to use for retrieving data from the database. This is the size of
1439 :     the contig segments being retrieved. The default is C<10000>
1440 :    
1441 :     =item redundancyHash (optional)
1442 :    
1443 :     A hash of feature IDs. If any feature present in this hash is found during
1444 :     the search, the search will stop and no data will be returned. The default
1445 :     is an empty hash (no check).
1446 :    
1447 :     =item redundancyFlag (optional)
1448 :    
1449 :     A reference to a scalar flag. If present, the entire method will be bypassed
1450 :     if the flag is TRUE. If a redundancy hash is specified and a redundant feature
1451 :     is found, this flag will be set to TRUE by the method.
1452 :    
1453 :     =item RETURN
1454 :    
1455 :     Returns a list of 4-tuples. Each tuple will contain a feature ID, a begin
1456 :     offset, a direction (C<+> or C<->), and a length, representing an occupied
1457 :     location on the contig and the feature to which it belongs. The complete
1458 :     list of locations will be to the left of the starting location and relatively
1459 :     close together, with no gap larger than the caller-specified maximum.
1460 :    
1461 :     =back
1462 :    
1463 :     =cut
1464 :    
1465 :     sub FindGapLeft {
1466 :     # Get the parameters.
1467 :     my ($self, $loc, $maxGap, $interval, $redundancyHash, $redundancyFlag) = @_;
1468 :     # Declare the return variable.
1469 :     my @retVal;
1470 :     # Fix up defaults for the missing parameters.
1471 :     $interval ||= 10000;
1472 :     if (! defined $redundancyHash) {
1473 :     $redundancyHash = {};
1474 :     }
1475 :     my $fakeFlag = 0;
1476 :     if (! defined $redundancyFlag) {
1477 :     $redundancyFlag = \$fakeFlag;
1478 :     }
1479 :     # This flag will be set to TRUE if we run out of locations or find a gap.
1480 :     my $gapFound = 0;
1481 :     # This will be used to store tuples found. If we are successful, it will
1482 :     # be copied to the return list.
1483 :     my @operonData;
1484 :     # Now we need to set up some data for the loop. In particular, the contig
1485 :     # ID, the strand (direction), and the starting point. We add one to the
1486 :     # starting current position to insure that the starting point is included
1487 :     # in the first search.
1488 :     my $currentPosition = $loc->Left + 1;
1489 :     my $contigID = $loc->Contig;
1490 :     my $strand = $loc->Dir;
1491 :     # This variable keeps the leftmost begin location found.
1492 :     my $begin = $loc->Left;
1493 :     # Loop until we find a redundancy or a gap.
1494 :     while (! $$redundancyFlag && ! $gapFound && $currentPosition >= 0) {
1495 :     # Compute the limits of the search interval for this iteration.
1496 :     my $nextPosition = $currentPosition - $interval;
1497 :     # Get all the locations in the interval.
1498 :     my @rows = $self->GetAll("IsLocatedIn",
1499 :     'IsLocatedIn(to-link) = ? AND IsLocatedIn(dir) = ? AND IsLocatedIn(begin) >= ? AND IsLocatedIn(begin) < ?',
1500 :     [$contigID, $strand, $nextPosition, $currentPosition],
1501 :     [qw(from-link begin dir len)]);
1502 :     # If nothing was found, it's a gap.
1503 :     if (! @rows) {
1504 :     $gapFound = 1;
1505 :     } else {
1506 :     # We got something, so we can loop through looking for gaps. The search
1507 :     # requires we sort by right point.
1508 :     my @sortableTuples;
1509 :     for my $tuple (@rows) {
1510 :     my ($fid, $left, $dir, $len) = @$tuple;
1511 :     push @sortableTuples, [$left + $len, $tuple];
1512 :     }
1513 :     my @sortedTuples = map { $_->[1] } sort { -($a->[0] <=> $b->[0]) } @sortableTuples;
1514 :     # Loop through the tuples, stopping at the first redundancy or gap.
1515 :     for my $tuple (@sortedTuples) { last if $gapFound || $$redundancyFlag;
1516 :     # Get this tuple's data.
1517 :     my ($fid, $left, $dir, $len) = @$tuple;
1518 :     # Is it close enough to be counted?
1519 :     if ($begin - ($left + $len) <= $maxGap) {
1520 :     # Yes. We can include this tuple.
1521 :     push @operonData, $tuple;
1522 :     # Update the begin point.
1523 :     $begin = $left;
1524 :     # Is it redundant? It's only reasonable to ask this if it's
1525 :     # an included feature.
1526 :     if ($redundancyHash->{$fid}) {
1527 :     $$redundancyFlag = 1;
1528 :     }
1529 :     } else {
1530 :     # No, it's not close enough. We've found a gap.
1531 :     $gapFound = 1;
1532 :     }
1533 :     }
1534 :     }
1535 :     # Set up for the next interval.
1536 :     $currentPosition = $nextPosition;
1537 :     }
1538 :     # If we're nonredundant, save our results.
1539 :     if (! $$redundancyFlag) {
1540 :     @retVal = @operonData;
1541 :     }
1542 :     # Return the result.
1543 :     return @retVal;
1544 :     }
1545 :    
1546 :     =head3 FindGapRight
1547 : parrello 1.13
1548 : parrello 1.14 my @operonData = $sap->FindGapRight($loc, $maxGap, $interval, \%redundancyHash, \$redundancyFlag);
1549 : parrello 1.13
1550 : parrello 1.14 This method is the dual of L</FindGapLeft>: it searches for a gap to the
1551 :     right of a location in the contig. The search will proceed from the
1552 :     starting point to the right, and will stop when a gap between occupied
1553 :     locations is found that is larger than the specified maximum. The caller
1554 :     has the option of specifying a hash of feature IDs that are redundant. If
1555 :     any feature in the hash is found, the search will stop early and the
1556 :     provided redundancy flag will be set. In addition, an interval size can
1557 :     be specified to tune the process of retrieving data from the database.
1558 : parrello 1.13
1559 :     =over 4
1560 :    
1561 : parrello 1.14 =item loc
1562 :    
1563 :     L<BasicLocation> object for the location from which the search is to start.
1564 :     This gives us the contig ID, the strand of interest (forward or backward),
1565 :     and the starting point of the search.
1566 :    
1567 :     =item maxGap
1568 :    
1569 :     The maximum allowable gap. The search will stop at the right end of the contig
1570 :     or the first gap larger than this amount.
1571 :    
1572 :     =item interval (optional)
1573 :    
1574 :     Interval to use for retrieving data from the database. This is the size of
1575 :     the contig segments being retrieved. The default is C<10000>
1576 :    
1577 :     =item redundancyHash (optional)
1578 :    
1579 :     A hash of feature IDs. If any feature present in this hash is found during
1580 :     the search, the search will stop and no data will be returned. The default
1581 :     is an empty hash (no check).
1582 :    
1583 :     =item redundancyFlag (optional)
1584 : parrello 1.13
1585 : parrello 1.14 A reference to a scalar flag. If present, the entire method will be bypassed
1586 :     if the flag is TRUE. If a redundancy hash is specified and a redundant feature
1587 :     is found, this flag will be set to TRUE by the method.
1588 : parrello 1.13
1589 :     =item RETURN
1590 :    
1591 : parrello 1.14 Returns a list of 4-tuples. Each tuple will contain a feature ID, a begin
1592 :     offset, a direction (C<+> or C<->), and a length, representing an occupied
1593 :     location on the contig and the feature to which it belongs. The complete
1594 :     list of locations will be to the right of the starting location and relatively
1595 :     close together, with no gap larger than the caller-specified maximum.
1596 : parrello 1.13
1597 :     =back
1598 :    
1599 :     =cut
1600 :    
1601 : parrello 1.14 sub FindGapRight {
1602 : parrello 1.13 # Get the parameters.
1603 : parrello 1.14 my ($self, $loc, $maxGap, $interval, $redundancyHash, $redundancyFlag) = @_;
1604 :     # Declare the return variable.
1605 :     my @retVal;
1606 :     # Fix up defaults for the missing parameters.
1607 :     $interval ||= 10000;
1608 :     if (! defined $redundancyHash) {
1609 :     $redundancyHash = {};
1610 :     }
1611 :     my $fakeFlag = 0;
1612 :     if (! defined $redundancyFlag) {
1613 :     $redundancyFlag = \$fakeFlag;
1614 :     }
1615 :     # This flag will be set to TRUE if we run out of locations or find a gap.
1616 :     my $gapFound = 0;
1617 :     # This will be used to store tuples found. If we are successful, it will
1618 :     # be copied to the return list.
1619 :     my @operonData;
1620 :     # Now we need to set up some data for the loop. In particular, the contig
1621 :     # ID, the strand (direction), and the starting point. We subtract one from the
1622 :     # starting current position to insure that the starting point is included
1623 :     # in the first search.
1624 :     my $currentPosition = $loc->Left - 1;
1625 :     my $contigID = $loc->Contig;
1626 :     my $strand = $loc->Dir;
1627 :     # Get the length of the contig.
1628 :     my $contigLen = $self->ContigLength($contigID);
1629 : parrello 1.19 Trace("Contig length is $contigLen. Starting at $currentPosition.") if T(3);
1630 : parrello 1.14 # This variable keeps the rightmost end location found.
1631 :     my $endPoint = $loc->Left;
1632 :     # Loop until we find a redundancy or a gap.
1633 :     while (! $$redundancyFlag && ! $gapFound && $currentPosition <= $contigLen) {
1634 : parrello 1.19 Trace("Checking at $currentPosition.") if T(3);
1635 : parrello 1.14 # Compute the limits of the search interval for this iteration.
1636 :     my $nextPosition = $currentPosition + $interval;
1637 :     # Get all the locations in the interval.
1638 :     my @rows = $self->GetAll("IsLocatedIn",
1639 :     'IsLocatedIn(to-link) = ? AND IsLocatedIn(dir) = ? AND IsLocatedIn(begin) >= ? AND IsLocatedIn(begin) < ?',
1640 : parrello 1.19 [$contigID, $strand, $currentPosition, $nextPosition],
1641 : parrello 1.14 [qw(from-link begin dir len)]);
1642 :     # If nothing was found, it's a gap.
1643 :     if (! @rows) {
1644 :     $gapFound = 1;
1645 : parrello 1.19 Trace("No result. Gap found.") if T(3);
1646 : parrello 1.14 } else {
1647 :     # We got something, so we can loop through looking for gaps. The search
1648 :     # requires we sort by left point.
1649 :     my @sortedTuples = sort { $a->[1] <=> $b->[1] } @rows;
1650 :     # Loop through the tuples, stopping at the first redundancy or gap.
1651 :     for my $tuple (@sortedTuples) { last if $gapFound || $$redundancyFlag;
1652 :     # Get this tuple's data.
1653 :     my ($fid, $left, $dir, $len) = @$tuple;
1654 :     # Is it close enough to be counted?
1655 :     if ($left - $endPoint <= $maxGap) {
1656 :     # Yes. We can include this tuple.
1657 :     push @operonData, $tuple;
1658 :     # Update the end point.
1659 :     $endPoint = $left + $len;
1660 :     # Is it redundant? It's only reasonable to ask this if it's
1661 :     # an included feature.
1662 :     if ($redundancyHash->{$fid}) {
1663 :     $$redundancyFlag = 1;
1664 :     }
1665 :     } else {
1666 :     # No, it's not close enough. We've found a gap.
1667 :     $gapFound = 1;
1668 : parrello 1.19 Trace("Long distance. Gap found.") if T(3);
1669 : parrello 1.14 }
1670 :     }
1671 :     }
1672 :     # Set up for the next interval.
1673 :     $currentPosition = $nextPosition;
1674 :     }
1675 :     # If we're nonredundant, save our results.
1676 :     if (! $$redundancyFlag) {
1677 :     @retVal = @operonData;
1678 :     }
1679 : parrello 1.13 # Return the result.
1680 : parrello 1.14 return @retVal;
1681 : parrello 1.13 }
1682 :    
1683 : parrello 1.17 =head3 GenomesInPairSet
1684 :    
1685 :     my @genomes = $sap->GenomesInPairSet($pairSetID);
1686 :    
1687 :     Return a list of the IDs for all of the genomes represented in the
1688 :     specified pair set. This is useful when analyzing what data is missing
1689 :     from the coupling tables.
1690 :    
1691 :     =over 4
1692 :    
1693 :     =item pairSetID
1694 :    
1695 :     ID of the pair set to examine.
1696 :    
1697 :     =item RETURN
1698 :    
1699 :     Returns a list of the IDs for the genomes represented in the specified pair set.
1700 :    
1701 :     =back
1702 :    
1703 :     =cut
1704 :    
1705 :     sub GenomesInPairSet {
1706 :     # Get the parameters.
1707 :     my ($self, $pairSetID) = @_;
1708 :     # We'll use this hash to isolate the genome IDs.
1709 :     my %retVal;
1710 :     # Get all the pairs in this set.
1711 :     my $query = $self->Get("IsDeterminedBy", "IsDeterminedBy(from-link) = ?",
1712 :     [$pairSetID]);
1713 :     while (my $pairData = $query->Fetch()) {
1714 :     # Record the genomes for the pegs in the pair. The pegs can be found
1715 :     # separated by a colon in the pairing ID.
1716 :     for my $peg (split /:/, $pairData->PrimaryValue('to-link')) {
1717 :     $retVal{genome_of($peg)} = 1;
1718 :     }
1719 :     }
1720 :     # Return the genome IDs.
1721 :     return keys %retVal;
1722 :     }
1723 :    
1724 : parrello 1.1
1725 :     =head2 Virtual Methods
1726 :    
1727 : parrello 1.11 =head3 PreferredName
1728 :    
1729 :     my $name = $erdb->PreferredName();
1730 :    
1731 :     Return the variable name to use for this database when generating code.
1732 :    
1733 :     =cut
1734 :    
1735 :     sub PreferredName {
1736 :     return 'sap';
1737 :     }
1738 :    
1739 : parrello 1.1 =head3 GetSourceObject
1740 :    
1741 :     my $source = $erdb->GetSourceObject();
1742 :    
1743 :     Return the object to be used in creating load files for this database. This is
1744 :     only the default source object. Loaders have the option of overriding the chosen
1745 : parrello 1.11 source object when constructing the L<ERDBLoadGroup> objects.
1746 : parrello 1.1
1747 :     =cut
1748 :    
1749 :     sub GetSourceObject {
1750 :     my ($self) = @_;
1751 :     # Insure the source object exists in our internal cache.
1752 :     if (! defined $self->{source}) {
1753 :     # We require the FIG object. If the user has no intention of
1754 :     # doing a load, this method won't be used, so he won't need to
1755 :     # have the FIG object on his system.
1756 :     require FIG;
1757 :     $self->{source} = FIG->new();
1758 :     }
1759 :     # Return it to the caller.
1760 :     return $self->{source};
1761 :     }
1762 :    
1763 :     =head3 SectionList
1764 :    
1765 :     my @sections = $erdb->SectionList();
1766 :    
1767 :     Return a list of the names for the different data sections used when loading this database.
1768 :     The default is a single string, in which case there is only one section representing the
1769 :     entire database.
1770 :    
1771 :     =cut
1772 :    
1773 :     sub SectionList {
1774 :     # Get the parameters.
1775 :     my ($self) = @_;
1776 : parrello 1.39 # The section names will be put in here.
1777 :     my @retVal;
1778 : parrello 1.1 # Get the genome hash.
1779 :     my $genomes = $self->GenomeHash();
1780 : parrello 1.39 # Check to see if we have alias files. If we do, we restrict our sections to ones
1781 :     # for which alias files exist.
1782 :     my @aliasFiles = Tracer::OpenDir($self->LoadDirectory() . "/AliasData", 1, 1);
1783 :     if (@aliasFiles) {
1784 :     for my $aliasFile (sort @aliasFiles) {
1785 :     if ($aliasFile =~ /alias\.(\d+\.\d+)\.tbl/) {
1786 :     my $genome = $1;
1787 :     if ($genomes->{$genome}) {
1788 :     push @retVal, $genome;
1789 :     }
1790 :     }
1791 :     }
1792 :     } else {
1793 :     # No alias files, so just use them all.
1794 :     @retVal = sort keys %$genomes;
1795 :     }
1796 : parrello 1.1 # Append the global section.
1797 :     push @retVal, GLOBAL;
1798 :     # Return the section list.
1799 :     return @retVal;
1800 :     }
1801 :    
1802 :     =head3 Loader
1803 :    
1804 :     my $groupLoader = $erdb->Loader($groupName, $source, $options);
1805 :    
1806 : parrello 1.11 Return an L<ERDBLoadGroup> object for the specified load group. This method is used
1807 :     by L<ERDBGenerator.pl> to create the load group objects. If you are not using
1808 :     L<ERDBGenerator.pl>, you don't need to override this method.
1809 : parrello 1.1
1810 :     =over 4
1811 :    
1812 :     =item groupName
1813 :    
1814 :     Name of the load group whose object is to be returned. The group name is
1815 :     guaranteed to be a single word with only the first letter capitalized.
1816 :    
1817 :     =item source
1818 :    
1819 :     The source object used to access the data from which the load file is derived. This
1820 :     is the same object returned by L</GetSourceObject>; however, we allow the caller to pass
1821 :     it in as a parameter so that we don't end up creating multiple copies of a potentially
1822 :     expensive data structure. It is permissible for this value to be undefined, in which
1823 :     case the source will be retrieved the first time the client asks for it.
1824 :    
1825 :     =item options
1826 :    
1827 :     Reference to a hash of command-line options.
1828 :    
1829 :     =item RETURN
1830 :    
1831 : parrello 1.11 Returns an L<ERDBLoadGroup> object that can be used to process the specified load group
1832 : parrello 1.1 for this database.
1833 :    
1834 :     =back
1835 :    
1836 :     =cut
1837 :    
1838 :     sub Loader {
1839 :     # Get the parameters.
1840 :     my ($self, $groupName, $options) = @_;
1841 :     # Compute the loader name.
1842 :     my $loaderClass = "${groupName}SaplingLoader";
1843 :     # Pull in its definition.
1844 :     require "$loaderClass.pm";
1845 :     # Create an object for it.
1846 :     my $retVal = eval("$loaderClass->new(\$self, \$options)");
1847 :     # Insure it worked.
1848 :     Confess("Could not create $loaderClass object: $@") if $@;
1849 :     # Return it to the caller.
1850 :     return $retVal;
1851 :     }
1852 :    
1853 :     =head3 LoadGroupList
1854 :    
1855 :     my @groups = $erdb->LoadGroupList();
1856 :    
1857 :     Returns a list of the names for this database's load groups. This method is used
1858 : parrello 1.11 by L<ERDBGenerator.pl> when the user wishes to load all table groups. The default
1859 : parrello 1.1 is a single group called 'All' that loads everything.
1860 :    
1861 :     =cut
1862 :    
1863 :     sub LoadGroupList {
1864 :     # Return the list.
1865 : parrello 1.34 return qw(Alignment Expression Subsystem Family Feature Protein Genome Scenario Model);
1866 : parrello 1.1 }
1867 :    
1868 :     =head3 LoadDirectory
1869 :    
1870 :     my $dirName = $erdb->LoadDirectory();
1871 :    
1872 :     Return the name of the directory in which load files are kept. The default is
1873 :     the FIG temporary directory, which is a really bad choice, but it's always there.
1874 :    
1875 :     =cut
1876 :    
1877 :     sub LoadDirectory {
1878 :     # Get the parameters.
1879 :     my ($self) = @_;
1880 :     # Return the directory name.
1881 :     return $self->{loadDirectory};
1882 :     }
1883 :    
1884 : parrello 1.21 =head3 UseInternalDBD
1885 :    
1886 :     my $flag = $erdb->UseInternalDBD();
1887 :    
1888 :     Return TRUE if this database should be allowed to use an internal DBD.
1889 :     The internal DBD is stored in the C<_metadata> table, which is created
1890 :     when the database is loaded. The Sapling uses an internal DBD.
1891 :    
1892 :     =cut
1893 :    
1894 :     sub UseInternalDBD {
1895 :     return 1;
1896 :     }
1897 : parrello 1.1
1898 :     1;

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3