[Bio] / Sprout / Sapling.pm Repository:
ViewVC logotype

Annotation of /Sprout/Sapling.pm

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.44 - (view) (download) (as text)

1 : parrello 1.1 #!/usr/bin/perl -w
2 :    
3 :     #
4 :     # Copyright (c) 2003-2006 University of Chicago and Fellowship
5 :     # for Interpretations of Genomes. All Rights Reserved.
6 :     #
7 :     # This file is part of the SEED Toolkit.
8 :     #
9 :     # The SEED Toolkit is free software. You can redistribute
10 :     # it and/or modify it under the terms of the SEED Toolkit
11 :     # Public License.
12 :     #
13 :     # You should have received a copy of the SEED Toolkit Public License
14 :     # along with this program; if not write to the University of Chicago
15 :     # at info@ci.uchicago.edu or the Fellowship for Interpretation of
16 :     # Genomes at veronika@thefig.info or download a copy from
17 :     # http://www.theseed.org/LICENSE.TXT.
18 :     #
19 :    
20 : olson 1.44 #
21 :     # This is a SAS component.
22 :     #
23 :    
24 : parrello 1.1 package Sapling;
25 :    
26 :     use strict;
27 :     use Tracer;
28 : parrello 1.11 use base qw(ERDB);
29 : parrello 1.1 use Stats;
30 : parrello 1.9 use DBKernel;
31 : parrello 1.11 use SeedUtils;
32 :     use BasicLocation;
33 : parrello 1.41 use ERDBGenerate;
34 : parrello 1.1 use XML::Simple;
35 : parrello 1.27 use Digest::MD5;
36 : parrello 1.1
37 :     =head1 Sapling Package
38 :    
39 :     Sapling Database Access Methods
40 :    
41 :     =head2 Introduction
42 :    
43 : parrello 1.11 The Sapling database is a new Entity-Relationship Database that attempts to
44 :     encapsulate our data in a portable form for distribution. It is loaded directly
45 :     from the genomes and subsystems of the SEED. This object has minimal
46 :     capabilities: most of its power comes the L<ERDB> base class.
47 : parrello 1.1
48 :     The fields in this object are as follows.
49 :    
50 :     =over 4
51 :    
52 :     =item loadDirectory
53 :    
54 :     Name of the directory containing the files used by the loaders.
55 :    
56 :     =item loaderSource
57 :    
58 : parrello 1.11 Source object for the loaders (a L<FIG> in our case).
59 : parrello 1.1
60 :     =item genomeHash
61 :    
62 :     Reference to a hash of the genomes to include when loading.
63 :    
64 :     =item subHash
65 :    
66 :     Reference to a hash of the subsystems to include when loading.
67 :    
68 :     =item tuning
69 :    
70 :     Reference to a hash of tuning parameters.
71 :    
72 : parrello 1.16 =item otuHash
73 :    
74 :     Reference to a hash that maps genome IDs to genome set names.
75 :    
76 : parrello 1.1 =back
77 :    
78 : parrello 1.8 =head2 Configuration and Construction
79 : parrello 1.1
80 :     The default loading profile for the Sapling database is to include all complete
81 :     genomes and all usable subsystems. This can be overridden by specifying a list of
82 :     genomes and subsystems in an XML configuration file. The file name should be
83 :     C<SaplingConfig.xml> in the specified data directory. The document element should
84 :     be C<Sapling>, and it has two sub-elements. The C<Genomes> element should contain as
85 :     its text a space-delimited list of genome IDs. The <Subsystems> element should contain
86 :     a list of subsystem names, one per line. If a particular section is missing, the
87 :     default list will be used.
88 :    
89 :     =head3 Example
90 :    
91 :     The following configuration file specifies 10 genomes and 6 subsystems.
92 :    
93 :     <Sapling>
94 :     <Genomes>
95 :     100226.1 31033.3 31964.1 36873.1 126740.4
96 :     155864.1 349307.7 350058.5 351348.5 412694.5
97 :     </Genomes>
98 :     <Subsystems>
99 :     Sugar_utilization_in_Thermotogales
100 :     Coenzyme_F420_hydrogenase
101 :     Ribosome_activity_modulation
102 :     prophage_tails
103 :     CBSS-393130.3.peg.794
104 :     Apigenin_derivatives
105 :     </Subsystems>
106 :     </Sapling>
107 :    
108 :     The XML file also contains tuning parameters that affect the way the data
109 :     is loaded. These are specified as attributes in the TuningParameters element,
110 :     as follows.
111 :    
112 :     =over 4
113 :    
114 :     =item maxLocationLength
115 :    
116 :     The maximum number of base pairs allowed in a single location. B<IsLocatedIn>
117 :     records are split into sections based on this length, so when you are looking
118 :     for all the features in a particular neighborhood, you can look for locations
119 :     within the maximum location distance from the neighborhood, and even if you have
120 :     a huge operon that contains tens of thousands of base pairs, you'll still be
121 :     able to find it.
122 :    
123 : parrello 1.4 =item maxSequenceLength
124 :    
125 :     The maximum number of base pairs allowed in a single DNA sequence. DNA sequences
126 :     are broken into segments to prevent excessively large genomes from clogging
127 :     memory during sequence resolution.
128 :    
129 : parrello 1.1 =back
130 :    
131 :     =head3 Global Section Constant
132 :    
133 :     Each section of the database used by the loader corresponds to a single genome.
134 :     The global section is loaded after all the others, and is concerned with data
135 :     not related to a particular genome.
136 :    
137 :     =cut
138 :    
139 :     # Name of the global section
140 :     use constant GLOBAL => 'Globals';
141 :    
142 :     =head3 Tuning Parameter Defaults
143 :    
144 :     Each tuning parameter must have a default value, in case it is not present in
145 :     the XML configuration file. The defaults are specified in a constant hash
146 :     reference called C<TUNING_DEFAULTS>.
147 :    
148 :     =cut
149 :    
150 :     use constant TUNING_DEFAULTS => {
151 : parrello 1.4 maxLocationLength => 4000,
152 : parrello 1.27 maxSequenceLength => 10000,
153 : parrello 1.1 };
154 :    
155 :     =head3 new
156 :    
157 :     my $sap = Sapling->new(%options);
158 :    
159 :     Construct a new Sapling object. The following options are supported.
160 :    
161 :     =over 4
162 :    
163 :     =item loadDirectory
164 :    
165 :     Data directory to be used by the loaders.
166 :    
167 : parrello 1.7 =item DBD
168 : parrello 1.1
169 :     XML database definition file.
170 :    
171 :     =item dbName
172 :    
173 :     Name of the database to use.
174 :    
175 :     =item sock
176 :    
177 :     Socket for accessing the database.
178 :    
179 :     =item userData
180 :    
181 :     Name and password used to log on to the database, separated by a slash.
182 :    
183 :     =item dbhost
184 :    
185 :     Database host name.
186 :    
187 : parrello 1.25 =item port
188 :    
189 : parrello 1.35 MYSQL port number to use (MySQL only).
190 :    
191 :     =item dbms
192 :    
193 :     Database management system to use (e.g. C<SQLite> or C<postgres>, default C<mysql>).
194 : parrello 1.25
195 : parrello 1.1 =back
196 :    
197 :     =cut
198 :    
199 :     sub new {
200 :     # Get the parameters.
201 :     my ($class, %options) = @_;
202 :     # Get the options.
203 : parrello 1.5 my $loadDirectory = $options{loadDirectory} || $FIG_Config::saplingData ||
204 :     "$FIG_Config::fig/SaplingData";
205 : parrello 1.7 my $dbd = $options{DBD} || "$loadDirectory/SaplingDBD.xml";
206 : olson 1.6 my $dbName = $options{dbName} || $FIG_Config::saplingDB || "nmpdr_sapling";
207 : parrello 1.3 my $userData = $options{userData} || "seed/";
208 :     my $dbhost = $options{dbhost} || $FIG_Config::saplingHost || "localhost";
209 : parrello 1.25 my $port = $options{port} || 3306;
210 : parrello 1.35 my $dbms = $options{dbms} || 'mysql';
211 : parrello 1.25 # Compute the socket. An empty string is a valid override here.
212 :     my $sock = $options{sock};
213 :     if (! defined $sock) {
214 :     $sock = $FIG_Config::sproutSock || "";
215 :     }
216 : parrello 1.1 # Compute the user name and password.
217 :     my ($user, $pass) = split '/', $userData, 2;
218 :     $pass = "" if ! defined $pass;
219 : parrello 1.24 Trace("Connecting to sapling database.") if T(2);
220 : parrello 1.1 # Connect to the database.
221 : parrello 1.35 my $dbh = DBKernel->new($dbms, $dbName, $user, $pass, $port, $dbhost, $sock);
222 : parrello 1.1 # Create the ERDB object.
223 : parrello 1.2 my $retVal = ERDB::new($class, $dbh, $dbd, %options);
224 : parrello 1.1 # Add the load directory pointer.
225 :     $retVal->{loadDirectory} = $loadDirectory;
226 :     # Set up the spaces for the loader source object, the subsystem hash, the
227 :     # genome hash, and the tuning parameters.
228 :     $retVal->{source} = undef;
229 :     $retVal->{genomeHash} = undef;
230 :     $retVal->{subHash} = undef;
231 :     $retVal->{tuning} = undef;
232 : parrello 1.16 # Set up the hash of genome IDs to OTUs.
233 :     $retVal->{otuHash} = {};
234 : parrello 1.1 # Return it.
235 :     return $retVal;
236 :     }
237 :    
238 : parrello 1.8 =head2 Public Methods
239 :    
240 : parrello 1.16 =head3 OTU
241 :    
242 :     my $otu = $sap->OTU($genomeID);
243 :    
244 :     Return the name of the Organism Taxonomic Unit (GenomeSet) for the
245 :     specified genome ID. OTU information is cached in memory, so that once it
246 :     is known, it does not need to be re-fetched from the database.
247 :    
248 :     =over 4
249 :    
250 :     =item genomeID
251 :    
252 :     ID of a genome or feature. If a feature ID is specified, the genome ID will be
253 :     extracted from it.
254 :    
255 :     =item RETURN
256 :    
257 :     Returns the name of the genome set for the specified genome, or C<undef> if the
258 :     genome is not in the
259 :    
260 :     =back
261 :    
262 :     =cut
263 :    
264 :     sub OTU {
265 :     # Get the parameters.
266 :     my ($self, $genomeID) = @_;
267 :     # Get the OTU hash.
268 :     my $otuHash = $self->{otuHash};
269 :     # Compute the real genome ID.
270 :     my $realGenomeID = ($genomeID =~ /^fig\|(\d+\.\d+)/ ? $1 : $genomeID);
271 :     # Look it up in the hash.
272 :     my $retVal = $otuHash->{$realGenomeID};
273 :     # Was it found?
274 :     if (! defined $retVal) {
275 :     # No, get the OTU from the database.
276 :     ($retVal) = $self->GetFlat("IsCollectedInto", "IsCollectedInto(from-link) = ?",
277 :     [$realGenomeID], "to-link");
278 :     # Save it in the hash for future use.
279 :     $otuHash->{$realGenomeID} = $retVal;
280 :     }
281 :     # Return the result.
282 :     return $retVal;
283 :     }
284 :    
285 : parrello 1.27 =head3 ProteinID
286 :    
287 :     my $key = $sap->ProteinID($sequence);
288 :    
289 :     Return the protein sequence ID that would be associated with a specific
290 :     protein sequence.
291 :    
292 :     =over 4
293 :    
294 :     =item sequence
295 :    
296 :     String containing the protein sequence in question.
297 :    
298 :     =item RETURN
299 :    
300 :     Returns the ID value for the specified protein sequence. If the sequence exists
301 :     in the database, it will have this ID in the B<ProteinSequence> table.
302 :    
303 :     =back
304 :    
305 :     =cut
306 :    
307 :     sub ProteinID {
308 :     # Get the parameters.
309 :     my ($self, $sequence) = @_;
310 :     # Compute the MD5 hash.
311 :     my $retVal = Digest::MD5::md5_hex($sequence);
312 :     # Return the result.
313 :     return $retVal;
314 :     }
315 :    
316 :    
317 : parrello 1.28 =head3 IsProteinID
318 :    
319 :     my $md5 = $sap->IsProteinID($identifier);
320 :    
321 :     Check for a protein identifier. If a protein identifier is found, the
322 :     corresponding protein sequence ID will be returned; otherwise, an
323 :     undefined value will be returned. A protein identifier is either a
324 :     raw protein sequence ID, an ID preceded by C<md5|>, or an ID preceded by
325 :     C<gnl|md5|>
326 :    
327 :     =over 4
328 :    
329 :     =item identifier
330 :    
331 :     Identifier to test.
332 :    
333 :     =item RETURN
334 :    
335 :     Returns the MD5 code from the protein identifier, or C<undef> if the incoming
336 :     string is not a protein identifier.
337 :    
338 :     =back
339 :    
340 :     =cut
341 :    
342 :     sub IsProteinID {
343 :     # Get the parameters.
344 :     my ($self, $identifier) = @_;
345 :     # Declare the return variable.
346 :     my $retVal;
347 :     # Check the input.
348 :     if ($identifier =~ /^(?:gnl\|)?(?:md5\|)?([0-9a-f]{32})$/) {
349 :     $retVal = $1;
350 :     }
351 :     # Return the result.
352 :     return $retVal;
353 :     }
354 :    
355 :    
356 : parrello 1.18 =head3 Assignment
357 :    
358 :     my $assignment = $sapling->Assignment($fid);
359 :    
360 :     Return the functional assignment for the specified feature.
361 :    
362 :     =over 4
363 :    
364 :     =item fid
365 :    
366 :     FIG ID of the desired feature.
367 :    
368 :     =item RETURN
369 :    
370 :     Returns the functional assignment of the specified feature, or C<undef>
371 :     if the feature does not exist.
372 :    
373 :     =back
374 :    
375 :     =cut
376 :    
377 :     sub Assignment {
378 :     # Get the parameters.
379 :     my ($self, $fid) = @_;
380 :     # Get the functional assignment.
381 :     my ($retVal) = $self->GetFlat("Feature", "Feature(id) = ?", [$fid], 'function');
382 :     # Return the result.
383 :     return $retVal;
384 :     }
385 :    
386 : parrello 1.27 =head3 IdsForProtein
387 :    
388 :     my @ids = $sap->IdsForProtein($protID);
389 :    
390 :     Return a list of all the identifiers associated with the specified
391 :     protein.
392 :    
393 :     =over 4
394 :    
395 :     =item protID
396 :    
397 :     ID of the protein of interest.
398 :    
399 :     =item RETURN
400 :    
401 :     Returns a list of the Identifiers for the specific protein or for genes that
402 :     produce the specific protein.
403 :    
404 :     =back
405 :    
406 :     =cut
407 :    
408 :     sub IdsForProtein {
409 :     # Get the parameters.
410 :     my ($self, $protID) = @_;
411 :     # We'll put the identifiers found in here.
412 :     my %retVal;
413 :     # Ask for identifiers that directly name the protein.
414 :     for my $id ($self->GetFlat("ProteinSequence IsNamedBy Identifier",
415 :     "ProteinSequence(id) = ?", [$protID],
416 :     'Identifier(id)')) {
417 :     $retVal{$id} = 1;
418 :     }
419 :     # Add identifiers that name genes producing the protein.
420 :     for my $id ($self->GetFlat("ProteinSequence IsProteinFor Feature IsIdentifiedBy Identifier",
421 :     "ProteinSequence(id) = ?", [$protID],
422 :     'Identifier(id)')) {
423 :     $retVal{$id} = 1;
424 :     }
425 :     # Return the results found.
426 :     return sort keys %retVal;
427 :     }
428 :    
429 : parrello 1.16
430 : parrello 1.11 =head3 ComputeDNA
431 :    
432 :     my $dna = $sap->ComputeDNA($location);
433 :    
434 :     Return the DNA sequence for the specified location.
435 :    
436 :     =over 4
437 :    
438 :     =item location
439 :    
440 :     A L<BasicLocation> object indicating the contig, start location, direction, and
441 :     length of the desired DNA segment.
442 :    
443 :     =item RETURN
444 :    
445 : parrello 1.13 Returns a string containing the desired DNA. The DNA comes back in pure lower-case.
446 : parrello 1.11
447 :     =back
448 :    
449 :     =cut
450 :    
451 :     sub ComputeDNA {
452 :     # Get the parameters.
453 :     my ($self, $location) = @_;
454 : parrello 1.20 # Get the contig, left end, and right end of the location. Note we subtract
455 :     # 1 to convert contig positions to string offsets.
456 : parrello 1.11 my $contig = $location->Contig;
457 : parrello 1.20 my $left = $location->Left - 1;
458 :     my $right = $location->Right - 1;
459 : parrello 1.31 # Insure the left location is valid.
460 :     if ($left < 0) {
461 :     $left = 0;
462 :     }
463 : parrello 1.11 # Get the DNA segment length.
464 :     my $maxSequenceLength = $self->TuningParameter("maxSequenceLength");
465 :     # Compute the key of the first segment of our DNA and the starting
466 :     # point in that segment.
467 :     my $leftOffset = $left % $maxSequenceLength;
468 :     my $leftKey = "$contig:" . Tracer::Pad(($left - $leftOffset)/$maxSequenceLength,
469 :     7, 1, '0');
470 :     # Compute the key of the last segment containing our DNA.
471 :     my $rightKey = "$contig:" . Tracer::Pad(int($right/$maxSequenceLength), 7, 1, '0');
472 :     my @results = $self->GetFlat("DNASequence",
473 : parrello 1.26 'DNASequence(id) >= ? AND DNASequence(id) <= ?',
474 : parrello 1.11 [$leftKey, $rightKey], 'sequence');
475 :     # Form all the DNA into a string and extract our piece.
476 :     my $retVal = substr(join("", @results), $leftOffset, $location->Length);
477 :     # If this is a backwards string, we need the reverse complement.
478 :     rev_comp(\$retVal) if $location->Dir eq '-';
479 :     # Return the result.
480 :     return $retVal;
481 :     }
482 :    
483 : parrello 1.12 =head3 FilterByGenome
484 :    
485 :     my @filteredFids = $sapling->FilterByGenome(\@fids, $genomeFilter);
486 :    
487 :     Filter the features using the specified genome-based criterion. The
488 :     criterion can be either a comma-separated list of genome IDs, or a
489 :     partial organism name.
490 :    
491 :     =over 4
492 :    
493 :     =item fids
494 :    
495 :     Reference to a list of feature IDs.
496 :    
497 :     =item genomeFilter
498 :    
499 :     A string specifying the filtering criterion. If undefined or blank, then
500 :     no filter is applied. If a name, then only features from genomes with a
501 :     matching name will be returned. A name is a match if the filter is an
502 :     exact match for some prefix of the organism name. Thus, C<Listeria> would
503 :     get all Listerias, while C<Listeria monocytogenes EGD-e> would match only
504 :     the specific EGD-e strain. For a more precise match, you can specify
505 :     instead a comma-delimited list of genome IDs. In this latter case, only
506 :     features for the listed genomes will be included in the results.
507 :    
508 :     =item RETURN
509 :    
510 :     Returns the features from the incoming list that match the filter condition.
511 :    
512 :     =back
513 :    
514 :     =cut
515 :    
516 :     sub FilterByGenome {
517 :     # Get the parameters.
518 :     my ($self, $fids, $genomeFilter) = @_;
519 :     # Declare the return variable.
520 :     my @retVal;
521 :     # Check the type of filter.
522 :     if (! $genomeFilter) {
523 :     # No filter, so copy the input directly to the result.
524 :     @retVal = @$fids;
525 :     } else {
526 :     # Trim edge spaces from the filter.
527 :     $genomeFilter = Tracer::Trim($genomeFilter);
528 :     # This hash will contain the permissible genome IDs.
529 :     my %genomeIDs;
530 :     # Check for a name. We assume we have a name if there's an
531 :     # alphabetic letter anywhere in the filter string.
532 :     if ($genomeFilter =~ /[a-zA-Z]/) {
533 :     # The filter contains something that does not look like a genome
534 :     # ID, so it is treated as a genome name. We get the IDs of the
535 :     # genomes with that name and put them in the hash.
536 :     %genomeIDs = map { $_ => 1 } $self->GetFlat("Genome",
537 :     'Genome(scientific-name) LIKE ?',
538 :     ["$genomeFilter%"], 'id');
539 :     } else {
540 :     # We are expecting a comma-delimited list of genome IDs, so we
541 :     # put these in our hash.
542 :     %genomeIDs = map { $_ => 1 } split(/\s*,\s*/, $genomeFilter);
543 :     }
544 :     # Now we loop through the features, keeping the ones whose genome ID
545 :     # matches something in the hash.
546 :     @retVal = grep { $genomeIDs{genome_of($_)} } @$fids;
547 :     }
548 :     # Return the result.
549 :     return @retVal;
550 :     }
551 :    
552 :    
553 : parrello 1.11 =head3 GetLocations
554 :    
555 :     my @locs = $sapling->GetLocations($fid);
556 :    
557 :     Return the locations of the DNA for the specified feature.
558 :    
559 :     =over 4
560 :    
561 :     =item fid
562 :    
563 :     ID of the feature whose location is desired.
564 :    
565 :     =item RETURN
566 :    
567 :     Returns a list of L<BasicLocation> objects for the locations containing the
568 :     feature's DNA.
569 :    
570 :     =back
571 :    
572 :     =cut
573 :    
574 :     sub GetLocations {
575 :     # Get the parameters.
576 :     my ($self, $fid) = @_;
577 :     # Declare the return variable.
578 :     my @retVal;
579 : parrello 1.23 # This will contain the last location found.
580 :     my $lastLoc;
581 : parrello 1.11 # Get this feature's locations.
582 :     my $qh = $self->Get("IsLocatedIn",
583 :     'IsLocatedIn(from-link) = ? ORDER BY IsLocatedIn(ordinal)',
584 :     [$fid]);
585 :     while (my $resultRow = $qh->Fetch()) {
586 :     # Compute the contig ID and other information.
587 :     my $contig = $resultRow->PrimaryValue('to-link');
588 :     my $begin = $resultRow->PrimaryValue('begin');
589 :     my $dir = $resultRow->PrimaryValue('dir');
590 :     my $len = $resultRow->PrimaryValue('len');
591 :     # Create a location from the location information.
592 :     my $start = ($dir eq '+' ? $begin : $begin + $len - 1);
593 : parrello 1.23 my $loc = BasicLocation->new($contig, $start, $dir, $len);
594 :     # Check to see if this location is adjacent to the previous one.
595 :     if ($lastLoc && $lastLoc->Adjacent($loc)) {
596 :     # It is, so merge it in.
597 :     $lastLoc->Merge($loc);
598 :     } else {
599 :     # It isn't, so push the new one on the list.
600 :     $lastLoc = $loc;
601 :     push @retVal, $loc;
602 :     }
603 : parrello 1.11 }
604 :     # Return the result.
605 :     return @retVal;
606 :     }
607 :    
608 :    
609 : parrello 1.8 =head3 IdentifiedProtein
610 :    
611 :     my $proteinID = $sap->IdentifiedProtein($id);
612 :    
613 :     Compute the protein for a specified identifier. If the identifier does
614 :     not exist or does not identify a protein, this method will return
615 :     C<undef>.
616 :    
617 :     =over 4
618 :    
619 :     =item id
620 :    
621 :     Identifier whose protein is desired.
622 :    
623 :     =item RETURN
624 :    
625 :     Returns the protein ID corresponding to the incoming identifier,
626 :     or C<undef> if the identifier does not exist or is not for a protein.
627 :    
628 :     =back
629 :    
630 :     =cut
631 :    
632 :     sub IdentifiedProtein {
633 :     # Get the parameters.
634 :     my ($self, $id) = @_;
635 :     # Declare the return variable.
636 :     my $retVal;
637 :     # Try to find a protein for this ID.
638 :     my ($proteinID) = $self->GetFlat("Identifier Names ProteinSequence",
639 :     "Identifier(id) = ?", [$id],
640 :     'ProteinSequence(id)');
641 :     if (defined $proteinID) {
642 :     # We found one, so we're done.
643 :     $retVal = $proteinID;
644 :     } else {
645 :     # Not a protein ID. See if it's the ID of a feature that has a
646 :     # protein connected. Note that it's possible to find more than one,
647 :     # but we're going to punt and pick the first.
648 :     ($proteinID) = $self->GetFlat("Identifier Identifies Feature Produces ProteinSequence",
649 :     "Identifier(id) = ? LIMIT 1", [$id],
650 :     'ProteinSequence(id)');
651 :     if (defined $proteinID) {
652 :     # We found a protein ID, so return it.
653 :     $retVal = $proteinID;
654 :     }
655 :     }
656 :     # Return the result.
657 :     return $retVal;
658 :     }
659 :    
660 : parrello 1.9 =head3 FeaturesByID
661 :    
662 :     my @fids = $sapling->FeaturesByID($id);
663 :    
664 :     Return all the features corresponding to the specified identifier. Only features
665 :     that represent the same locus will be returned.
666 :    
667 :     =over 4
668 :    
669 :     =item id
670 :    
671 :     Identifier of interest.
672 :    
673 :     =item RETURN
674 :    
675 :     Returns a list of all the features in the database that match the given
676 :     identifier.
677 :    
678 :     =back
679 :    
680 :     =cut
681 :    
682 :     sub FeaturesByID {
683 :     # Get the parameters.
684 :     my ($self, $id) = @_;
685 :     # Ask for features from the database.
686 :     my @retVal = $self->GetFlat("Identifies", "Identifies(from-link) = ?", [$id],
687 :     'to-link');
688 :     # Return the result.
689 :     return @retVal;
690 :     }
691 :    
692 :     =head3 ProteinsByID
693 :    
694 :     my @fids = $sapling->ProteinsByID($id);
695 :    
696 :     Return all the features that have the same protein sequence as the
697 : parrello 1.10 identified feature. The returned features mar or may not have the same locus. If
698 :     the identifier is not for a protein encoding gene, no result will be returned.
699 : parrello 1.9
700 :     =over 4
701 :    
702 :     =item id
703 :    
704 : parrello 1.10 Identifier of interest. This can be any alias identifier from the B<Identifier>
705 :     table (which includes the FIG ID).
706 : parrello 1.9
707 :     =item RETURN
708 :    
709 : parrello 1.10 Returns a list of FIG IDs for features having the same protein sequence. If the
710 :     identifier does not specify a protein-encoding gene, the list will be empty.
711 : parrello 1.9
712 :     =back
713 :    
714 :     =cut
715 :    
716 :     sub ProteinsByID {
717 :     # Get the parameters.
718 :     my ($self, $id) = @_;
719 :     # Declare the return variable.
720 :     my @retVal;
721 : parrello 1.10 # Compute the protein for this identifier.
722 :     my $protID = $self->IdentifiedProtein($id);
723 :     # Only proceed if a protein was found. If no protein was found, we're
724 :     # already set up to return an empty list.
725 :     if (defined $protID) {
726 :     # Get all the features connected to the identified protein.
727 :     @retVal = $self->GetFlat("IsProteinFor", "IsProteinFor(from-link) = ?",
728 :     [$protID], "IsProteinFor(to-link)");
729 :     }
730 : parrello 1.9 # Return the result.
731 :     return @retVal;
732 :     }
733 :    
734 :     =head3 GetSubsystem
735 :    
736 :     my $ssData = $sapling->GetSubsystem($ssName);
737 :    
738 : parrello 1.11 Return a L<SaplingSubsys> object for the named subsystem.
739 : parrello 1.9
740 :     =over 4
741 : parrello 1.8
742 : parrello 1.9 =item ssName
743 :    
744 :     Name of the desired subsystem.
745 :    
746 :     =item RETURN
747 :    
748 :     Returns an object that defines multiple useful methods for manipulating the
749 :     named subsystem.
750 :    
751 :     =back
752 :    
753 :     =cut
754 :    
755 :     sub GetSubsystem {
756 :     # Get the parameters.
757 :     my ($self, $ssName) = @_;
758 :     # Declare the return variable.
759 :     require SaplingSubsys;
760 :     my $retVal = SaplingSubsys->new($ssName, $self);
761 :     # Return the result.
762 :     return $retVal;
763 :     }
764 : parrello 1.8
765 :    
766 :     =head3 GenesInRegion
767 :    
768 :     my @pegs = $sap->GenesInRegion($location);
769 :    
770 :     Return a list of the IDs for the features that overlap the specified
771 :     region on a contig.
772 :    
773 :     =over 4
774 :    
775 :     =item location
776 :    
777 :     Location of interest, either in the form of a location string (e.g.
778 : parrello 1.11 C<360108.3:NZ_AANK01000002_264528_264007>) or a L<BasicLocation>
779 : parrello 1.8 object.
780 :    
781 :     =item RETURN
782 :    
783 :     Returns a list of feature IDs. The features in the list will be all
784 :     those that overlap or occur inside the location of interest.
785 :    
786 :     =back
787 :    
788 :     =cut
789 :    
790 :     sub GenesInRegion {
791 :     # Get the parameters.
792 :     my ($self, $location) = @_;
793 :     # Insure we have a location object.
794 :     my $locObject = (ref $location ? $location : BasicLocation->new($location));
795 :     # Get the beginning and the end of the location of interest.
796 :     my $begin = $locObject->Left();
797 :     my $end = $locObject->Right();
798 :     # For performance reasons, we limit the possible starting location, using the
799 :     # tuning parameter for maximum location length.
800 :     my $limit = $begin - $self->TuningParameter('maxLocationLength');
801 :     # Perform the query. Note we use a hash to eliminate duplicates.
802 :     my %retVal = map { $_ => 1 } $self->GetFlat('Contig IsLocusFor Feature',
803 :     "Contig(id) = ? AND IsLocusFor(begin) <= ? AND " .
804 :     "IsLocusFor(begin) > ? AND " .
805 :     "IsLocusFor(begin) + IsLocusFor(len) >= ?",
806 :     [$locObject->Contig(), $end, $limit, $begin],
807 :     'Feature(id)');
808 :     # Return the result.
809 :     return sort keys %retVal;
810 :     }
811 :    
812 :     =head3 GetFasta
813 :    
814 : parrello 1.9 my $fasta = $sapling->GetFasta($proteinID, $id, $comment);
815 : parrello 1.8
816 :     Return a FASTA sequence for the specified protein. An optional identifier
817 :     can be provided to be used as the identification string.
818 :    
819 :     =over 4
820 :    
821 :     =item proteinID
822 :    
823 :     Protein sequence identifier.
824 :    
825 :     =item id (optional)
826 :    
827 :     The identifier to be used in the FASTA output. If omitted, the protein ID
828 :     is used.
829 :    
830 : parrello 1.9 =item comment (optional)
831 :    
832 :     The comment string to be used in the identification line of the FASTA output.
833 :     If omitted, no comment will be present.
834 :    
835 : parrello 1.8 =item RETURN
836 :    
837 :     Returns a FASTA string for the protein. This includes the identification
838 : parrello 1.14 line and the protein letters themselves.
839 : parrello 1.8
840 :     =back
841 :    
842 :     =cut
843 :    
844 :     sub GetFasta {
845 :     # Get the parameters.
846 : parrello 1.9 my ($self, $proteinID, $id, $comment) = @_;
847 : parrello 1.11 # Compute the identifier.
848 : parrello 1.8 my $realID = $id || "md5|$proteinID";
849 :     # Declare the return variable.
850 :     my $retVal;
851 :     # Get the protein sequence.
852 :     my ($sequence) = $self->GetFlat("ProteinSequence", "ProteinSequence(id) = ?",
853 :     [$proteinID], "sequence");
854 :     # It's an error if the sequence was not found.
855 :     if (! defined $sequence) {
856 :     Confess("No protein found with the sequence identifier $proteinID.");
857 :     } else {
858 : parrello 1.11 # Create a FASTA string for the protein.
859 :     $retVal = SeedUtils::create_fasta_record($realID, $comment, $sequence);
860 : parrello 1.8 }
861 :     # Return the result.
862 :     return $retVal;
863 :     }
864 : parrello 1.1
865 :    
866 : parrello 1.4 =head3 Taxonomy
867 :    
868 : parrello 1.14 my @taxonomy = $sap->Taxonomy($genomeID, $format);
869 : parrello 1.4
870 :     Return the full taxonomy of the specified genome, starting from the
871 : parrello 1.14 domain downward.
872 : parrello 1.4
873 :     =over 4
874 :    
875 :     =item genomeID
876 :    
877 :     ID of the genome whose taxonomy is desired. The genome does not need to exist
878 :     in the database: the version number will be lopped off and the result used as
879 :     an entry point into the taxonomy tree.
880 :    
881 : parrello 1.14 =item format (optional)
882 :    
883 :     Format of the taxonomy. C<names> will return primary names, C<numbers> will
884 :     return taxonomy numbers, and C<both> will return taxonomy number followed by
885 :     primary name. The default is C<names>.
886 :    
887 : parrello 1.4 =item RETURN
888 :    
889 :     Returns a list of taxonomy names, starting from the domain and moving
890 :     down to the node where the genome is attached.
891 :    
892 :     =back
893 :    
894 :     =cut
895 :    
896 :     sub Taxonomy {
897 :     # Get the parameters.
898 : parrello 1.14 my ($self, $genomeID, $format) = @_;
899 : parrello 1.4 # Get the genome's taxonomic group.
900 :     my ($taxon) = split /\./, $genomeID, 2;
901 :     # We'll put the return data in here.
902 :     my @retVal;
903 :     # Loop until we hit a domain.
904 :     my $domainFlag;
905 :     while (! $domainFlag) {
906 :     # Get the data we need for this taxonomic group.
907 :     my ($taxonData) = $self->GetAll('TaxonomicGrouping IsInGroup',
908 :     'TaxonomicGrouping(id) = ?', [$taxon],
909 :     'domain scientific-name IsInGroup(to-link)');
910 :     # If we didn't find what we're looking for, then we have a problem. This
911 :     # would indicate a node below the domain level that doesn't have a parent
912 :     # or (more likely) an invalid input string.
913 :     if (! $taxonData) {
914 :     # Terminate the loop and trace a warning.
915 :     $domainFlag = 1;
916 :     Trace("Could not find node or parent for \"$taxon\".") if T(1);
917 :     } else {
918 :     # Extract the data for the current group. Note we overwrite our
919 :     # taxonomy ID with the ID of our parent, priming the next iteration
920 :     # of the loop.
921 :     my $name;
922 : parrello 1.14 my $oldTaxon = $taxon;
923 : parrello 1.4 ($domainFlag, $name, $taxon) = @$taxonData;
924 : parrello 1.14 # Compute the value we want to put in the output list.
925 :     my $value;
926 :     if ($format eq 'numbers') {
927 :     $value = $oldTaxon;
928 :     } elsif ($format eq 'both') {
929 :     $value = "$oldTaxon $name";
930 :     } else {
931 :     $value = $name;
932 :     }
933 :     # Put the current group's data in the return list.
934 :     unshift @retVal, $value;
935 : parrello 1.4 }
936 :     }
937 :     # Return the result.
938 :     return @retVal;
939 :     }
940 :    
941 : parrello 1.18 =head3 IsDeletedFid
942 :    
943 :     my $flag = $sapling->IsDeletedFid($fid);
944 :    
945 :     Return TRUE if the specified feature is B<not> in the database, else
946 :     FALSE.
947 :    
948 :     =over 4
949 :    
950 :     =item fid
951 :    
952 :     FIG ID of the relevant feature.
953 :    
954 :     =item RETURN
955 :    
956 :     Returns TRUE if the specified feature is in the database, else FALSE.
957 :    
958 :     =back
959 :    
960 :     =cut
961 :    
962 :     sub IsDeletedFid {
963 :     # Get the parameters.
964 :     my ($self, $fid) = @_;
965 :     # Check for the feature. If the feature does not exist, we'll get an
966 :     # undefined value (FALSE). If it does, we'll get the feature ID itself
967 :     # (TRUE).
968 :     my ($retVal) = $self->GetFlat("Feature", "Feature(id) = ?", [$fid], 'id');
969 :     # Return FALSE if the feature was found, TRUE if it was not found.
970 :     return ($retVal ? 0 : 1);
971 :     }
972 :    
973 : parrello 1.4
974 : parrello 1.1 =head3 GenomeHash
975 :    
976 :     my $genomeHash = $sap->GenomeHash();
977 :    
978 :     Return a hash of the genomes configured to be in this database. The list
979 :     is either taken from the active SEED database or from a configuration
980 :     file in the data directory. The hash maps genome IDs to TRUE.
981 :    
982 :     =cut
983 :    
984 :     sub GenomeHash {
985 :     # Get the parameters.
986 :     my ($self) = @_;
987 :     # We'll build the hash in here.
988 :     my %genomeHash;
989 :     # Do we already have a list?
990 :     if (! defined $self->{genomeHash}) {
991 :     # No, check for a configuration file.
992 :     my $xml = $self->ReadConfigFile();
993 :     if (defined $xml && $xml->{Genomes}) {
994 :     # We found one and it has a genome list, so extract the genomes.
995 :     %genomeHash = map { $_ => 1 } grep { $_ =~ /\S/ } split /\s+/, $xml->{Genomes};
996 :     } else {
997 :     # No, so get the genome list.
998 :     my $fig = $self->GetSourceObject();
999 : parrello 1.8 my @genomes = $fig->genomes();
1000 : parrello 1.1 # Verify the genome list to insure every genome has an organism
1001 :     # directory.
1002 :     for my $genome (@genomes) {
1003 :     if (-d "$FIG_Config::organisms/$genome") {
1004 :     $genomeHash{$genome} = 1;
1005 :     }
1006 :     }
1007 :     }
1008 :     # Store the genomes in this object.
1009 :     $self->{genomeHash} = \%genomeHash;
1010 :     }
1011 :     # Return the result.
1012 :     return $self->{genomeHash};
1013 :     }
1014 :    
1015 :     =head3 SubsystemID
1016 :    
1017 :     my $subID = $sap->SubsystemID($subName);
1018 :    
1019 :     Return the ID of the subsystem with the specified name.
1020 :    
1021 :     =over 4
1022 :    
1023 :     =item subName
1024 :    
1025 :     Name of the relevant subsystem. A subsystem name with underscores for spaces
1026 :     will return the same ID as a subsystem name with the spaces still in it.
1027 :    
1028 :     =item RETURN
1029 :    
1030 : parrello 1.4 Returns a normalized subsystem name.
1031 : parrello 1.1
1032 :     =back
1033 :    
1034 :     =cut
1035 :    
1036 :     sub SubsystemID {
1037 :     # Get the parameters.
1038 :     my ($self, $subName) = @_;
1039 : parrello 1.4 # Normalize the subsystem name by converting underscores to spaces.
1040 : parrello 1.37 # Underscores at the end are not converted.
1041 : parrello 1.4 my $retVal = $subName;
1042 : parrello 1.38 my $trailer = chop $retVal;
1043 :     $retVal =~ tr/_/ /;
1044 :     $retVal .= $trailer;
1045 : parrello 1.1 # Return the result.
1046 :     return $retVal;
1047 :     }
1048 :    
1049 : parrello 1.14 =head3 Alias
1050 :    
1051 :     my $translatedID = $sap->Alias($fid, $source);
1052 :    
1053 :     Return an alternate ID of the specified type for the specified feature.
1054 :     If no alternate ID of that type exists, the incoming value will be
1055 :     returned unchanged.
1056 :    
1057 :     =over 4
1058 :    
1059 :     =item fid
1060 :    
1061 :     FIG ID of the feature whose alias identifier is desired.
1062 :    
1063 :     =item source
1064 :    
1065 :     Database type for the alternate ID (e.g. C<LocusTag>, C<NCBI>, C<RefSeq>). If
1066 :     C<SEED> is specified, the ID will be returned unchanged and no database lookup
1067 :     will occur.
1068 :    
1069 :     =item RETURN
1070 :    
1071 :     Returns an equivalent ID for the specified feature that belongs to the specified
1072 :     database (that is, has the specified source). If no such ID exists, returns the
1073 :     incoming ID.
1074 :    
1075 :     =back
1076 :    
1077 :     =cut
1078 :    
1079 :     sub Alias {
1080 :     # Get the parameters.
1081 :     my ($self, $fid, $source) = @_;
1082 :     # Default to the incoming value.
1083 :     my $retVal = $fid;
1084 :     # We only have work to do if the database type isn't "SEED".
1085 :     if ($source ne 'SEED') {
1086 :     # Look for an alias.
1087 :     my ($alias) = $self->GetFlat("IsIdentifiedBy Identifier",
1088 : parrello 1.15 'IsIdentifiedBy(from-link) = ? AND Identifier(source) = ?',
1089 : parrello 1.14 [$fid, $source], 'Identifier(natural-form)');
1090 :     # If we found one, return it.
1091 :     if (defined $alias) {
1092 :     $retVal = $alias;
1093 :     }
1094 :     }
1095 :     # Return the result.
1096 :     return $retVal;
1097 :     }
1098 :    
1099 :    
1100 :     =head3 ContigLength
1101 :    
1102 :     my $contigLen = $sap->ContigLength($contigID);
1103 :    
1104 :     Return the number of base pairs in the specified contig.
1105 :    
1106 :     =over 4
1107 :    
1108 :     =item contigID
1109 :    
1110 :     ID of the contig of interest.
1111 :    
1112 :     =item RETURN
1113 :    
1114 :     Returns the number of base pairs in the specified contig, or 0 if the contig
1115 :     does not exist.
1116 :    
1117 :     =back
1118 :    
1119 :     =cut
1120 :    
1121 :     sub ContigLength {
1122 :     # Get the parameters.
1123 :     my ($self, $contigID) = @_;
1124 :     # Try to find the length.
1125 :     my ($retVal) = $self->GetEntityValues(Contig => $contigID, 'length');
1126 :     # Convert not-found to 0.
1127 :     $retVal = 0 if ! defined $retVal;
1128 :     # Return the result.
1129 :     return $retVal;
1130 :     }
1131 :    
1132 :    
1133 :     =head2 Configuration-Related Methods
1134 :    
1135 : parrello 1.1 =head3 SubsystemHash
1136 :    
1137 :     my $subHash = $sap->SubsystemHash();
1138 :    
1139 :     Return a hash of the subsystems configured to be in this database. The
1140 :     list is either taken from the active SEED database or from a
1141 :     configuration file in the data directory. The hash maps subsystem names
1142 :     to TRUE.
1143 :    
1144 :     =cut
1145 :    
1146 :     sub SubsystemHash {
1147 :     # Get the parameters.
1148 :     my ($self) = @_;
1149 :     # We'll build the hash in here.
1150 :     my %subHash;
1151 :     # Do we already have a list?
1152 :     if (! defined $self->{subHash}) {
1153 :     # No, check for a configuration file.
1154 :     my $xml = $self->ReadConfigFile();
1155 :     if (defined $xml && $xml->{Subsystems}) {
1156 :     # We found one, and it has subsystems, so we extract them.
1157 :     # A little dancing is necessary to trim spaces.
1158 :     my @subs = map { $_ =~ /\s*(\S.+\S)/; $1 } split /\n/, $xml->{Subsystems};
1159 :     # Here we need to clear out any null subsystem names resulting from
1160 :     # blank lines in the file.
1161 :     %subHash = map { $_ => 1 } grep { $_ } @subs;
1162 :     } else {
1163 :     # No config file, so we ask the FIG object.
1164 :     my $fig = $self->GetSourceObject();
1165 : parrello 1.36 for my $subsystem ($fig->all_subsystems()) {
1166 :     my $subsysID = $self->SubsystemID($subsystem);
1167 : parrello 1.37 $subHash{$subsysID} = 1;
1168 :     }
1169 : parrello 1.1 }
1170 :     # Store the subsystems in this object.
1171 :     $self->{subHash} = \%subHash;
1172 :     }
1173 :     # Return the result.
1174 :     return $self->{subHash};
1175 :     }
1176 :    
1177 :     =head3 TuningParameter
1178 :    
1179 :     my $parm = $erdb->TuningParameter($parmName);
1180 :    
1181 :     Return the value of the specified tuning parameter. Tuning parameters are
1182 :     read from the XML configuration file.
1183 :    
1184 :     =over 4
1185 :    
1186 :     =item parmName
1187 :    
1188 :     Name of the parameter whose value is desired.
1189 :    
1190 :     =item RETURN
1191 :    
1192 :     Returns the paramter value.
1193 :    
1194 :     =back
1195 :    
1196 :     =cut
1197 :    
1198 :     sub TuningParameter {
1199 :     # Get the parameters.
1200 :     my ($self, $parmName) = @_;
1201 :     # Insure we have the parameters in memory.
1202 :     if (! defined $self->{tuning}) {
1203 :     # Read the configuration file.
1204 :     my $configFile = $self->ReadConfigFile();
1205 :     # Get the tuning parameters (if any).
1206 :     my $tuning;
1207 :     if (! defined $configFile || ! exists $configFile->{TuningParameters}) {
1208 :     $tuning = {};
1209 :     } else {
1210 :     $tuning = $configFile->{TuningParameters};
1211 :     }
1212 :     # Merge in the default option values.
1213 :     Tracer::MergeOptions($tuning, TUNING_DEFAULTS);
1214 :     # Save the result in our object.
1215 :     $self->{tuning} = $tuning;
1216 :     }
1217 :     # Extract the tuning paramter.
1218 :     my $retVal = $self->{tuning}{$parmName};
1219 :     # Throw an error if it does not exist.
1220 :     Confess("Invalid tuning parameter \"$parmName\".") if ! defined $retVal;
1221 :     # Return the result.
1222 :     return $retVal;
1223 :     }
1224 :    
1225 :    
1226 :     =head3 ReadConfigFile
1227 :    
1228 :     my $xmlObject = $sap->ReadConfigFile();
1229 :    
1230 :     Return the hash structure created from reading the configuration file, or
1231 :     an undefined value if the file is not found.
1232 :    
1233 :     =cut
1234 :    
1235 :     sub ReadConfigFile {
1236 :     my ($self) = @_;
1237 :     # Declare the return variable.
1238 :     my $retVal;
1239 :     # Compute the configuration file name.
1240 :     my $fileName = "$self->{loadDirectory}/SaplingConfig.xml";
1241 :     # Did we find it?
1242 :     if (-f $fileName) {
1243 :     # Yes, read it in.
1244 :     $retVal = XMLin($fileName);
1245 :     }
1246 :     # Return the result.
1247 :     return $retVal;
1248 :     }
1249 :    
1250 :     =head3 GlobalSection
1251 :    
1252 :     my $flag = $sap->GlobalSection($name);
1253 :    
1254 :     Return TRUE if the specified section name is the global section, FALSE
1255 :     otherwise.
1256 :    
1257 :     =over 4
1258 :    
1259 :     =item name
1260 :    
1261 :     Section name to test.
1262 :    
1263 :     =item RETURN
1264 :    
1265 : parrello 1.4 Returns TRUE if the parameter matches the GLOBAL constant, else FALSE.
1266 : parrello 1.1
1267 :     =back
1268 :    
1269 :     =cut
1270 :    
1271 :     sub GlobalSection {
1272 :     # Get the parameters.
1273 :     my ($self, $name) = @_;
1274 :     # Return the result.
1275 :     return ($name eq GLOBAL);
1276 :     }
1277 :    
1278 : parrello 1.35 =head3 LoadGenome
1279 :    
1280 :     my $stats = $sap->LoadGenome($genome, $directory);
1281 :    
1282 :     Load the specified genome directory into the database. The genome's DNA, features,
1283 :     protein sequences, and other supporting information will be inserted. If the
1284 :     genome already exists, numerous errors will occur; therefore, it is recommended
1285 :     that the genome be deleted first using the L<ERDB/Delete> method.
1286 :    
1287 :     =over 4
1288 :    
1289 :     =item genom
1290 :    
1291 :     The ID of the genome being loaded.
1292 :    
1293 :     =item directory
1294 :    
1295 :     Name of the genome directory.
1296 :    
1297 :     =item RETURN
1298 :    
1299 :     Returns a statistics object describing the load activity.
1300 :    
1301 :     =back
1302 :    
1303 :     =cut
1304 :    
1305 :     sub LoadGenome {
1306 :     # Get the parameters.
1307 :     my ($self, $genome, $directory) = @_;
1308 :     # Verify that the directory exists.
1309 :     Confess ("Genome directory $directory not found.") if ! -d $directory;
1310 :     # Verify that the ID is valid.
1311 :     Confess("Invalid genome ID $genome.") if $genome !~ /^\d+\.\d+$/;
1312 :     # Import the loader and call it.
1313 :     require SaplingGenomeLoader;
1314 :     my $retVal = SaplingGenomeLoader::Load($self, $genome, $directory);
1315 :     # Return the statistics object.
1316 :     return $retVal;
1317 :     }
1318 :    
1319 : parrello 1.14 =head2 Special-Purpose Methods
1320 :    
1321 :     =head3 ComputeFeatureFilter
1322 :    
1323 : parrello 1.30 my ($objects, $filter, @parms) = $sap->ComputeFeatureFilter($source, $genome);
1324 : parrello 1.14
1325 :     Compute the initial object name list, filter string, and parameter list
1326 :     for a query by feature ID. The object name list will always end with the
1327 :     B<Feature> entity, and the combination of the filter string and parameter
1328 :     list will translate the incoming ID from the specified format to a real
1329 :     FIG feature ID. If the specified format B<is> FIG feature IDs, then the
1330 :     query will start on the B<Feature> entity; otherwise, it will start with
1331 :     the B<Identifier> entity. This is a special-purpose method that performs
1332 :     the task of intelligently modifying queries to allow for external ID
1333 :     types.
1334 :    
1335 :     =over 4
1336 :    
1337 :     =item source (optional)
1338 :    
1339 :     Database source of the IDs specified-- C<SEED> for FIG IDs, C<GENE> for standard
1340 :     gene identifiers, or C<LocusTag> for locus tags. In addition, you may specify
1341 :     C<RefSeq>, C<CMR>, C<NCBI>, C<Trembl>, or C<UniProt> for IDs from those databases.
1342 :     Use C<mixed> to allow mixed ID types (though this may cause problems when the same
1343 : parrello 1.22 ID has different meanings in different databases). Use C<prefixed> to allow IDs with
1344 :     prefixing indicating the ID type (e.g. C<uni|P00934> for a UniProt ID, C<gi|135813> for
1345 :     an NCBI identifier, and so forth). The default is C<SEED>.
1346 : parrello 1.14
1347 : parrello 1.30 =item genome (optional)
1348 :    
1349 :     ID of a genome. If specified, only features from the specified genome will be
1350 :     accepted by the filter. This is important for IDs that are ambiguous between
1351 :     genomes (like Locus Tags). If omitted, no genome filtering will take place.
1352 :    
1353 : parrello 1.14 =item RETURN
1354 :    
1355 :     Returns a list containing parameters to the desired query call. The first element
1356 :     is the prefix for the object name list, the second is the prefix for the filter
1357 :     string, and the subsequent elements form the prefix for the parameter value list.
1358 :    
1359 :     =back
1360 :    
1361 :     =cut
1362 :    
1363 :     sub ComputeFeatureFilter {
1364 :     # Get the parameters.
1365 : parrello 1.30 my ($self, $source, $genome) = @_;
1366 : parrello 1.14 # Declare the return variables.
1367 :     my ($objects, $filter, @parms);
1368 : parrello 1.30 # This will be set to TRUE if we are directly processing FIG IDs.
1369 :     my $figOnly = 0;
1370 : parrello 1.14 # Determine the source type.
1371 :     if (! defined $source || $source eq 'SEED') {
1372 :     # Here we're directly processing FIG IDs.
1373 :     $objects = 'Feature';
1374 : parrello 1.30 $filter = 'Feature(id) = ?';
1375 :     $figOnly = 1;
1376 : parrello 1.14 } elsif ($source eq 'mixed') {
1377 :     # Here we're processing mixed IDs of unknown type.
1378 :     $objects = 'Identifier Identifies Feature';
1379 :     $filter = 'Identifier(natural-form) = ?';
1380 : parrello 1.22 } elsif ($source eq 'prefixed') {
1381 :     # Here we're processing mixed IDs with prefixes. This is the internal form
1382 :     # of the ID.
1383 :     $objects = 'Identifier Identifies Feature';
1384 :     $filter = 'Identifier(id) = ?';
1385 : parrello 1.14 } else {
1386 :     # Here we're processing a fixed ID type from an external database.
1387 :     # This is the case that requires an additional parameter. Note that
1388 :     # we insist that the additional parameter matches the first parameter
1389 :     # mark.
1390 :     $objects = 'Identifier Identifies Feature';
1391 :     $filter = 'Identifier(source) = ? AND Identifier(natural-form) = ?';
1392 :     push @parms, $source;
1393 :     }
1394 : parrello 1.30 # Was a genome ID specified?
1395 :     if ($genome) {
1396 :     # Yes. Add genome filtering.
1397 :     if ($figOnly) {
1398 :     # In a FIG ID situation, we can simply add the genome filtering to the front
1399 :     # of the object list.
1400 :     $objects = "Genome IsOwnerOf $objects";
1401 :     } else {
1402 :     # Otherwise, we need to do an AND thing.
1403 :     $objects = "Genome IsOwnerOf Feature AND $objects";
1404 :     }
1405 :     # Add the genome ID to the filter clause.
1406 :     $filter = "Genome(id) = ? AND $filter";
1407 :     # Add it to the parameter list.
1408 :     unshift @parms, $genome;
1409 :     }
1410 : parrello 1.14 # Return the results.
1411 :     return ($objects, $filter, @parms);
1412 :     }
1413 :    
1414 :    
1415 :     =head3 FindGapLeft
1416 :    
1417 :     my @operonData = $sap->FindGapLeft($loc, $maxGap, $interval, \%redundancyHash, \$redundancyFlag);
1418 :    
1419 :     This method performs a rather arcane task: searching for a gap to the
1420 :     left of a location in the contig. The search will proceed from the
1421 :     starting point to the left, and will stop when a gap between occupied
1422 :     locations is found that is larger than the specified maximum. The caller
1423 :     has the option of specifying a hash of feature IDs that are redundant. If
1424 :     any feature in the hash is found, the search will stop early and the
1425 :     provided redundancy flag will be set. In addition, an interval size can
1426 :     be specified to tune the process of retrieving data from the database.
1427 :    
1428 :     =over 4
1429 :    
1430 :     =item loc
1431 :    
1432 :     L<BasicLocation> object for the location from which the search is to start.
1433 :     This gives us the contig ID, the strand of interest (forward or backward),
1434 :     and the starting point of the search.
1435 :    
1436 :     =item maxGap
1437 :    
1438 :     The maximum allowable gap. The search will stop at the left end of the contig
1439 :     or the first gap larger than this amount.
1440 :    
1441 :     =item interval (optional)
1442 :    
1443 :     Interval to use for retrieving data from the database. This is the size of
1444 :     the contig segments being retrieved. The default is C<10000>
1445 :    
1446 :     =item redundancyHash (optional)
1447 :    
1448 :     A hash of feature IDs. If any feature present in this hash is found during
1449 :     the search, the search will stop and no data will be returned. The default
1450 :     is an empty hash (no check).
1451 :    
1452 :     =item redundancyFlag (optional)
1453 :    
1454 :     A reference to a scalar flag. If present, the entire method will be bypassed
1455 :     if the flag is TRUE. If a redundancy hash is specified and a redundant feature
1456 :     is found, this flag will be set to TRUE by the method.
1457 :    
1458 :     =item RETURN
1459 :    
1460 :     Returns a list of 4-tuples. Each tuple will contain a feature ID, a begin
1461 :     offset, a direction (C<+> or C<->), and a length, representing an occupied
1462 :     location on the contig and the feature to which it belongs. The complete
1463 :     list of locations will be to the left of the starting location and relatively
1464 :     close together, with no gap larger than the caller-specified maximum.
1465 :    
1466 :     =back
1467 :    
1468 :     =cut
1469 :    
1470 :     sub FindGapLeft {
1471 :     # Get the parameters.
1472 :     my ($self, $loc, $maxGap, $interval, $redundancyHash, $redundancyFlag) = @_;
1473 :     # Declare the return variable.
1474 :     my @retVal;
1475 :     # Fix up defaults for the missing parameters.
1476 :     $interval ||= 10000;
1477 :     if (! defined $redundancyHash) {
1478 :     $redundancyHash = {};
1479 :     }
1480 :     my $fakeFlag = 0;
1481 :     if (! defined $redundancyFlag) {
1482 :     $redundancyFlag = \$fakeFlag;
1483 :     }
1484 :     # This flag will be set to TRUE if we run out of locations or find a gap.
1485 :     my $gapFound = 0;
1486 :     # This will be used to store tuples found. If we are successful, it will
1487 :     # be copied to the return list.
1488 :     my @operonData;
1489 :     # Now we need to set up some data for the loop. In particular, the contig
1490 :     # ID, the strand (direction), and the starting point. We add one to the
1491 :     # starting current position to insure that the starting point is included
1492 :     # in the first search.
1493 :     my $currentPosition = $loc->Left + 1;
1494 :     my $contigID = $loc->Contig;
1495 :     my $strand = $loc->Dir;
1496 :     # This variable keeps the leftmost begin location found.
1497 :     my $begin = $loc->Left;
1498 :     # Loop until we find a redundancy or a gap.
1499 :     while (! $$redundancyFlag && ! $gapFound && $currentPosition >= 0) {
1500 :     # Compute the limits of the search interval for this iteration.
1501 :     my $nextPosition = $currentPosition - $interval;
1502 :     # Get all the locations in the interval.
1503 :     my @rows = $self->GetAll("IsLocatedIn",
1504 :     'IsLocatedIn(to-link) = ? AND IsLocatedIn(dir) = ? AND IsLocatedIn(begin) >= ? AND IsLocatedIn(begin) < ?',
1505 :     [$contigID, $strand, $nextPosition, $currentPosition],
1506 :     [qw(from-link begin dir len)]);
1507 :     # If nothing was found, it's a gap.
1508 :     if (! @rows) {
1509 :     $gapFound = 1;
1510 :     } else {
1511 :     # We got something, so we can loop through looking for gaps. The search
1512 :     # requires we sort by right point.
1513 :     my @sortableTuples;
1514 :     for my $tuple (@rows) {
1515 :     my ($fid, $left, $dir, $len) = @$tuple;
1516 :     push @sortableTuples, [$left + $len, $tuple];
1517 :     }
1518 :     my @sortedTuples = map { $_->[1] } sort { -($a->[0] <=> $b->[0]) } @sortableTuples;
1519 :     # Loop through the tuples, stopping at the first redundancy or gap.
1520 :     for my $tuple (@sortedTuples) { last if $gapFound || $$redundancyFlag;
1521 :     # Get this tuple's data.
1522 :     my ($fid, $left, $dir, $len) = @$tuple;
1523 :     # Is it close enough to be counted?
1524 :     if ($begin - ($left + $len) <= $maxGap) {
1525 :     # Yes. We can include this tuple.
1526 :     push @operonData, $tuple;
1527 :     # Update the begin point.
1528 :     $begin = $left;
1529 :     # Is it redundant? It's only reasonable to ask this if it's
1530 :     # an included feature.
1531 :     if ($redundancyHash->{$fid}) {
1532 :     $$redundancyFlag = 1;
1533 :     }
1534 :     } else {
1535 :     # No, it's not close enough. We've found a gap.
1536 :     $gapFound = 1;
1537 :     }
1538 :     }
1539 :     }
1540 :     # Set up for the next interval.
1541 :     $currentPosition = $nextPosition;
1542 :     }
1543 :     # If we're nonredundant, save our results.
1544 :     if (! $$redundancyFlag) {
1545 :     @retVal = @operonData;
1546 :     }
1547 :     # Return the result.
1548 :     return @retVal;
1549 :     }
1550 :    
1551 :     =head3 FindGapRight
1552 : parrello 1.13
1553 : parrello 1.14 my @operonData = $sap->FindGapRight($loc, $maxGap, $interval, \%redundancyHash, \$redundancyFlag);
1554 : parrello 1.13
1555 : parrello 1.14 This method is the dual of L</FindGapLeft>: it searches for a gap to the
1556 :     right of a location in the contig. The search will proceed from the
1557 :     starting point to the right, and will stop when a gap between occupied
1558 :     locations is found that is larger than the specified maximum. The caller
1559 :     has the option of specifying a hash of feature IDs that are redundant. If
1560 :     any feature in the hash is found, the search will stop early and the
1561 :     provided redundancy flag will be set. In addition, an interval size can
1562 :     be specified to tune the process of retrieving data from the database.
1563 : parrello 1.13
1564 :     =over 4
1565 :    
1566 : parrello 1.14 =item loc
1567 :    
1568 :     L<BasicLocation> object for the location from which the search is to start.
1569 :     This gives us the contig ID, the strand of interest (forward or backward),
1570 :     and the starting point of the search.
1571 :    
1572 :     =item maxGap
1573 :    
1574 :     The maximum allowable gap. The search will stop at the right end of the contig
1575 :     or the first gap larger than this amount.
1576 :    
1577 :     =item interval (optional)
1578 :    
1579 :     Interval to use for retrieving data from the database. This is the size of
1580 :     the contig segments being retrieved. The default is C<10000>
1581 :    
1582 :     =item redundancyHash (optional)
1583 :    
1584 :     A hash of feature IDs. If any feature present in this hash is found during
1585 :     the search, the search will stop and no data will be returned. The default
1586 :     is an empty hash (no check).
1587 :    
1588 :     =item redundancyFlag (optional)
1589 : parrello 1.13
1590 : parrello 1.14 A reference to a scalar flag. If present, the entire method will be bypassed
1591 :     if the flag is TRUE. If a redundancy hash is specified and a redundant feature
1592 :     is found, this flag will be set to TRUE by the method.
1593 : parrello 1.13
1594 :     =item RETURN
1595 :    
1596 : parrello 1.14 Returns a list of 4-tuples. Each tuple will contain a feature ID, a begin
1597 :     offset, a direction (C<+> or C<->), and a length, representing an occupied
1598 :     location on the contig and the feature to which it belongs. The complete
1599 :     list of locations will be to the right of the starting location and relatively
1600 :     close together, with no gap larger than the caller-specified maximum.
1601 : parrello 1.13
1602 :     =back
1603 :    
1604 :     =cut
1605 :    
1606 : parrello 1.14 sub FindGapRight {
1607 : parrello 1.13 # Get the parameters.
1608 : parrello 1.14 my ($self, $loc, $maxGap, $interval, $redundancyHash, $redundancyFlag) = @_;
1609 :     # Declare the return variable.
1610 :     my @retVal;
1611 :     # Fix up defaults for the missing parameters.
1612 :     $interval ||= 10000;
1613 :     if (! defined $redundancyHash) {
1614 :     $redundancyHash = {};
1615 :     }
1616 :     my $fakeFlag = 0;
1617 :     if (! defined $redundancyFlag) {
1618 :     $redundancyFlag = \$fakeFlag;
1619 :     }
1620 :     # This flag will be set to TRUE if we run out of locations or find a gap.
1621 :     my $gapFound = 0;
1622 :     # This will be used to store tuples found. If we are successful, it will
1623 :     # be copied to the return list.
1624 :     my @operonData;
1625 :     # Now we need to set up some data for the loop. In particular, the contig
1626 :     # ID, the strand (direction), and the starting point. We subtract one from the
1627 :     # starting current position to insure that the starting point is included
1628 :     # in the first search.
1629 :     my $currentPosition = $loc->Left - 1;
1630 :     my $contigID = $loc->Contig;
1631 :     my $strand = $loc->Dir;
1632 :     # Get the length of the contig.
1633 :     my $contigLen = $self->ContigLength($contigID);
1634 : parrello 1.19 Trace("Contig length is $contigLen. Starting at $currentPosition.") if T(3);
1635 : parrello 1.14 # This variable keeps the rightmost end location found.
1636 :     my $endPoint = $loc->Left;
1637 :     # Loop until we find a redundancy or a gap.
1638 :     while (! $$redundancyFlag && ! $gapFound && $currentPosition <= $contigLen) {
1639 : parrello 1.19 Trace("Checking at $currentPosition.") if T(3);
1640 : parrello 1.14 # Compute the limits of the search interval for this iteration.
1641 :     my $nextPosition = $currentPosition + $interval;
1642 :     # Get all the locations in the interval.
1643 :     my @rows = $self->GetAll("IsLocatedIn",
1644 :     'IsLocatedIn(to-link) = ? AND IsLocatedIn(dir) = ? AND IsLocatedIn(begin) >= ? AND IsLocatedIn(begin) < ?',
1645 : parrello 1.19 [$contigID, $strand, $currentPosition, $nextPosition],
1646 : parrello 1.14 [qw(from-link begin dir len)]);
1647 :     # If nothing was found, it's a gap.
1648 :     if (! @rows) {
1649 :     $gapFound = 1;
1650 : parrello 1.19 Trace("No result. Gap found.") if T(3);
1651 : parrello 1.14 } else {
1652 :     # We got something, so we can loop through looking for gaps. The search
1653 :     # requires we sort by left point.
1654 :     my @sortedTuples = sort { $a->[1] <=> $b->[1] } @rows;
1655 :     # Loop through the tuples, stopping at the first redundancy or gap.
1656 :     for my $tuple (@sortedTuples) { last if $gapFound || $$redundancyFlag;
1657 :     # Get this tuple's data.
1658 :     my ($fid, $left, $dir, $len) = @$tuple;
1659 :     # Is it close enough to be counted?
1660 :     if ($left - $endPoint <= $maxGap) {
1661 :     # Yes. We can include this tuple.
1662 :     push @operonData, $tuple;
1663 :     # Update the end point.
1664 :     $endPoint = $left + $len;
1665 :     # Is it redundant? It's only reasonable to ask this if it's
1666 :     # an included feature.
1667 :     if ($redundancyHash->{$fid}) {
1668 :     $$redundancyFlag = 1;
1669 :     }
1670 :     } else {
1671 :     # No, it's not close enough. We've found a gap.
1672 :     $gapFound = 1;
1673 : parrello 1.19 Trace("Long distance. Gap found.") if T(3);
1674 : parrello 1.14 }
1675 :     }
1676 :     }
1677 :     # Set up for the next interval.
1678 :     $currentPosition = $nextPosition;
1679 :     }
1680 :     # If we're nonredundant, save our results.
1681 :     if (! $$redundancyFlag) {
1682 :     @retVal = @operonData;
1683 :     }
1684 : parrello 1.13 # Return the result.
1685 : parrello 1.14 return @retVal;
1686 : parrello 1.13 }
1687 :    
1688 : parrello 1.17 =head3 GenomesInPairSet
1689 :    
1690 :     my @genomes = $sap->GenomesInPairSet($pairSetID);
1691 :    
1692 :     Return a list of the IDs for all of the genomes represented in the
1693 :     specified pair set. This is useful when analyzing what data is missing
1694 :     from the coupling tables.
1695 :    
1696 :     =over 4
1697 :    
1698 :     =item pairSetID
1699 :    
1700 :     ID of the pair set to examine.
1701 :    
1702 :     =item RETURN
1703 :    
1704 :     Returns a list of the IDs for the genomes represented in the specified pair set.
1705 :    
1706 :     =back
1707 :    
1708 :     =cut
1709 :    
1710 :     sub GenomesInPairSet {
1711 :     # Get the parameters.
1712 :     my ($self, $pairSetID) = @_;
1713 :     # We'll use this hash to isolate the genome IDs.
1714 :     my %retVal;
1715 :     # Get all the pairs in this set.
1716 :     my $query = $self->Get("IsDeterminedBy", "IsDeterminedBy(from-link) = ?",
1717 :     [$pairSetID]);
1718 :     while (my $pairData = $query->Fetch()) {
1719 :     # Record the genomes for the pegs in the pair. The pegs can be found
1720 :     # separated by a colon in the pairing ID.
1721 :     for my $peg (split /:/, $pairData->PrimaryValue('to-link')) {
1722 :     $retVal{genome_of($peg)} = 1;
1723 :     }
1724 :     }
1725 :     # Return the genome IDs.
1726 :     return keys %retVal;
1727 :     }
1728 :    
1729 : parrello 1.1
1730 :     =head2 Virtual Methods
1731 :    
1732 : parrello 1.11 =head3 PreferredName
1733 :    
1734 :     my $name = $erdb->PreferredName();
1735 :    
1736 :     Return the variable name to use for this database when generating code.
1737 :    
1738 :     =cut
1739 :    
1740 :     sub PreferredName {
1741 :     return 'sap';
1742 :     }
1743 :    
1744 : parrello 1.1 =head3 GetSourceObject
1745 :    
1746 :     my $source = $erdb->GetSourceObject();
1747 :    
1748 :     Return the object to be used in creating load files for this database. This is
1749 :     only the default source object. Loaders have the option of overriding the chosen
1750 : parrello 1.11 source object when constructing the L<ERDBLoadGroup> objects.
1751 : parrello 1.1
1752 :     =cut
1753 :    
1754 :     sub GetSourceObject {
1755 :     my ($self) = @_;
1756 :     # Insure the source object exists in our internal cache.
1757 :     if (! defined $self->{source}) {
1758 :     # We require the FIG object. If the user has no intention of
1759 :     # doing a load, this method won't be used, so he won't need to
1760 :     # have the FIG object on his system.
1761 :     require FIG;
1762 :     $self->{source} = FIG->new();
1763 :     }
1764 :     # Return it to the caller.
1765 :     return $self->{source};
1766 :     }
1767 :    
1768 :     =head3 SectionList
1769 :    
1770 :     my @sections = $erdb->SectionList();
1771 :    
1772 :     Return a list of the names for the different data sections used when loading this database.
1773 :     The default is a single string, in which case there is only one section representing the
1774 :     entire database.
1775 :    
1776 :     =cut
1777 :    
1778 :     sub SectionList {
1779 :     # Get the parameters.
1780 :     my ($self) = @_;
1781 : parrello 1.39 # The section names will be put in here.
1782 :     my @retVal;
1783 : parrello 1.41 # Get the name of the section control file.
1784 : parrello 1.43 my $controlFileName = ERDBGenerate::CreateFileName("SectionList", undef, 'control', $self->LoadDirectory());
1785 : parrello 1.41 # Check to see if it exists.
1786 :     if (-f $controlFileName) {
1787 :     # Yes. Pull out the sections from it.
1788 : parrello 1.42 Trace("Reading section list from $controlFileName.") if T(ERDBGenerate => 2);
1789 : parrello 1.41 @retVal = Tracer::GetFile($controlFileName);
1790 :     } else {
1791 :     # No, so we have to create it. Get the genome hash.
1792 :     my $genomes = $self->GenomeHash();
1793 :     @retVal = sort keys %$genomes;
1794 :     # Append the global section.
1795 :     push @retVal, GLOBAL;
1796 :     # Write out the control file with the new sections.
1797 : parrello 1.42 Trace("Writing section list to $controlFileName.") if T(ERDBGenerate => 2);
1798 : parrello 1.41 Tracer::PutFile($controlFileName, \@retVal);
1799 :     }
1800 : parrello 1.1 # Return the section list.
1801 :     return @retVal;
1802 :     }
1803 :    
1804 :     =head3 Loader
1805 :    
1806 :     my $groupLoader = $erdb->Loader($groupName, $source, $options);
1807 :    
1808 : parrello 1.11 Return an L<ERDBLoadGroup> object for the specified load group. This method is used
1809 :     by L<ERDBGenerator.pl> to create the load group objects. If you are not using
1810 :     L<ERDBGenerator.pl>, you don't need to override this method.
1811 : parrello 1.1
1812 :     =over 4
1813 :    
1814 :     =item groupName
1815 :    
1816 :     Name of the load group whose object is to be returned. The group name is
1817 :     guaranteed to be a single word with only the first letter capitalized.
1818 :    
1819 :     =item source
1820 :    
1821 :     The source object used to access the data from which the load file is derived. This
1822 :     is the same object returned by L</GetSourceObject>; however, we allow the caller to pass
1823 :     it in as a parameter so that we don't end up creating multiple copies of a potentially
1824 :     expensive data structure. It is permissible for this value to be undefined, in which
1825 :     case the source will be retrieved the first time the client asks for it.
1826 :    
1827 :     =item options
1828 :    
1829 :     Reference to a hash of command-line options.
1830 :    
1831 :     =item RETURN
1832 :    
1833 : parrello 1.11 Returns an L<ERDBLoadGroup> object that can be used to process the specified load group
1834 : parrello 1.1 for this database.
1835 :    
1836 :     =back
1837 :    
1838 :     =cut
1839 :    
1840 :     sub Loader {
1841 :     # Get the parameters.
1842 :     my ($self, $groupName, $options) = @_;
1843 :     # Compute the loader name.
1844 :     my $loaderClass = "${groupName}SaplingLoader";
1845 :     # Pull in its definition.
1846 :     require "$loaderClass.pm";
1847 :     # Create an object for it.
1848 :     my $retVal = eval("$loaderClass->new(\$self, \$options)");
1849 :     # Insure it worked.
1850 :     Confess("Could not create $loaderClass object: $@") if $@;
1851 :     # Return it to the caller.
1852 :     return $retVal;
1853 :     }
1854 :    
1855 :     =head3 LoadGroupList
1856 :    
1857 :     my @groups = $erdb->LoadGroupList();
1858 :    
1859 :     Returns a list of the names for this database's load groups. This method is used
1860 : parrello 1.11 by L<ERDBGenerator.pl> when the user wishes to load all table groups. The default
1861 : parrello 1.1 is a single group called 'All' that loads everything.
1862 :    
1863 :     =cut
1864 :    
1865 :     sub LoadGroupList {
1866 :     # Return the list.
1867 : parrello 1.34 return qw(Alignment Expression Subsystem Family Feature Protein Genome Scenario Model);
1868 : parrello 1.1 }
1869 :    
1870 :     =head3 LoadDirectory
1871 :    
1872 :     my $dirName = $erdb->LoadDirectory();
1873 :    
1874 :     Return the name of the directory in which load files are kept. The default is
1875 :     the FIG temporary directory, which is a really bad choice, but it's always there.
1876 :    
1877 :     =cut
1878 :    
1879 :     sub LoadDirectory {
1880 :     # Get the parameters.
1881 :     my ($self) = @_;
1882 :     # Return the directory name.
1883 :     return $self->{loadDirectory};
1884 :     }
1885 :    
1886 : parrello 1.21 =head3 UseInternalDBD
1887 :    
1888 :     my $flag = $erdb->UseInternalDBD();
1889 :    
1890 :     Return TRUE if this database should be allowed to use an internal DBD.
1891 :     The internal DBD is stored in the C<_metadata> table, which is created
1892 :     when the database is loaded. The Sapling uses an internal DBD.
1893 :    
1894 :     =cut
1895 :    
1896 :     sub UseInternalDBD {
1897 :     return 1;
1898 :     }
1899 : parrello 1.1
1900 :     1;

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3