[Bio] / Sprout / SproutLoad.pm Repository:
ViewVC logotype

View of /Sprout/SproutLoad.pm

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.27 - (download) (as text) (annotate)
Mon Jan 30 22:00:04 2006 UTC (13 years, 9 months ago) by parrello
Branch: MAIN
Changes since 1.26: +1 -1 lines
*** empty log message ***

#!/usr/bin/perl -w

package SproutLoad;

    use strict;
    use Tracer;
    use PageBuilder;
    use ERDBLoad;
    use FIG;
    use Sprout;
    use Stats;
    use BasicLocation;
    use HTML;

=head1 Sprout Load Methods

=head2 Introduction

This object contains the methods needed to copy data from the FIG data store to the
Sprout database. It makes heavy use of the ERDBLoad object to manage the load into
individual tables. The client can create an instance of this object and then
call methods for each group of tables to load. For example, the following code will
load the Genome- and Feature-related tables. (It is presumed the first command line
parameter contains the name of a file specifying the genomes.)

    my $fig = FIG->new();
    my $sprout = SFXlate->new_sprout_only();
    my $spl = SproutLoad->new($sprout, $fig, $ARGV[0]);
    my $stats = $spl->LoadGenomeData();
    $stats->Accumulate($spl->LoadFeatureData());
    print $stats->Show();

This module makes use of the internal Sprout property C<_erdb>.

It is worth noting that the FIG object does not need to be a real one. Any object
that implements the FIG methods for data retrieval could be used. So, for example,
this object could be used to copy data from one Sprout database to another, or
from any FIG-compliant data story implemented in the future.

To insure that this is possible, each time the FIG object is used, it will be via
a variable called C<$fig>. This makes it fairly straightforward to determine which
FIG methods are required to load the Sprout database.

This object creates the load files; however, the tables are not created until it
is time to actually do the load from the files into the target database.

=cut

#: Constructor SproutLoad->new();

=head2 Public Methods

=head3 new

C<< my $spl = SproutLoad->new($sprout, $fig, $genomeFile, $subsysFile, $options); >>

Construct a new Sprout Loader object, specifying the two participating databases and
the name of the files containing the list of genomes and subsystems to use.

=over 4

=item sprout

Sprout object representing the target database. This also specifies the directory to
be used for creating the load files.

=item fig

FIG object representing the source data store from which the data is to be taken.

=item genomeFile

Either the name of the file containing the list of genomes to load or a reference to
a hash of genome IDs to access codes. If nothing is specified, all complete genomes
will be loaded and the access code will default to 1. The genome list is presumed
to be all-inclusive. In other words, all existing data in the target database will
be deleted and replaced with the data on the specified genes. If a file is specified,
it should contain one genome ID and access code per line, tab-separated.

=item subsysFile

Either the name of the file containing the list of trusted subsystems or a reference
to a list of subsystem names. If nothing is specified, all known subsystems will be
considered trusted. Only subsystem data related to the trusted subsystems is loaded.

=item options

Reference to a hash of command-line options.

=back

=cut

sub new {
    # Get the parameters.
    my ($class, $sprout, $fig, $genomeFile, $subsysFile, $options) = @_;
    # Load the list of genomes into a hash.
    my %genomes;
    if (! defined($genomeFile) || $genomeFile eq '') {
        # Here we want all the complete genomes and an access code of 1.
        my @genomeList = $fig->genomes(1);
        %genomes = map { $_ => 1 } @genomeList;
    } else {
        my $type = ref $genomeFile;
        Trace("Genome file parameter type is \"$type\".") if T(3);
        if ($type eq 'HASH') {
            # Here the user specified a hash of genome IDs to access codes, which is
            # exactly what we want.
            %genomes = %{$genomeFile};
        } elsif (! $type || $type eq 'SCALAR' ) {
            # The caller specified a file, so read the genomes from the file. (Note
            # that some PERLs return an empty string rather than SCALAR.)
            my @genomeList = Tracer::GetFile($genomeFile);
            if (! @genomeList) {
                # It's an error if the genome file is empty or not found.
                Confess("No genomes found in file \"$genomeFile\".");
            } else {
                # We build the genome Hash using a loop rather than "map" so that
                # an omitted access code can be defaulted to 1.
                for my $genomeLine (@genomeList) {
                    my ($genomeID, $accessCode) = split("\t", $genomeLine);
                    if (undef $accessCode) {
                        $accessCode = 1;
                    }
                    $genomes{$genomeID} = $accessCode;
                }
            }
        } else {
            Confess("Invalid genome parameter ($type) in SproutLoad constructor.");
        }
    }
    # Load the list of trusted subsystems.
    my %subsystems = ();
    if (! defined $subsysFile || $subsysFile eq '') {
        # Here we want all the subsystems.
        %subsystems = map { $_ => 1 } $fig->all_subsystems();
    } else {
        my $type = ref $subsysFile;
        if ($type eq 'ARRAY') {
            # Here the user passed in a list of subsystems.
            %subsystems = map { $_ => 1 } @{$subsysFile};
        } elsif (! $type || $type eq 'SCALAR') {
            # Here the list of subsystems is in a file.
            if (! -e $subsysFile) {
                # It's an error if the file does not exist.
                Confess("Trusted subsystem file not found.");
            } else {
                # GetFile automatically chomps end-of-line characters, so this
                # is an easy task.
                %subsystems = map { $_ => 1 } Tracer::GetFile($subsysFile);
            }
        } else {
            Confess("Invalid subsystem parameter in SproutLoad constructor.");
        }
    }
    # Get the data directory from the Sprout object.
    my ($directory) = $sprout->LoadInfo();
    # Create the Sprout load object.
    my $retVal = {
                  fig => $fig,
                  genomes => \%genomes,
                  subsystems => \%subsystems,
                  sprout => $sprout,
                  loadDirectory => $directory,
                  erdb => $sprout->{_erdb},
                  loaders => [],
                  options => $options
                 };
    # Bless and return it.
    bless $retVal, $class;
    return $retVal;
}

=head3 LoadOnly

C<< my $flag = $spl->LoadOnly; >>

Return TRUE if we are in load-only mode, else FALSE.

=cut

sub LoadOnly {
    my ($self) = @_;
    return $self->{options}->{loadOnly};
}

=head3 PrimaryOnly

C<< my $flag = $spl->PrimaryOnly; >>

Return TRUE if only the main entity is to be loaded, else FALSE.

=cut

sub PrimaryOnly {
    my ($self) = @_;
    return $self->{options}->{primaryOnly};
}

=head3 LoadGenomeData

C<< my $stats = $spl->LoadGenomeData(); >>

Load the Genome, Contig, and Sequence data from FIG into Sprout.

The Sequence table is the largest single relation in the Sprout database, so this
method is expected to be slow and clumsy. At some point we will need to make it
restartable, since an error 10 gigabytes through a 20-gigabyte load is bound to be
very annoying otherwise.

The following relations are loaded by this method.

    Genome
    HasContig
    Contig
    IsMadeUpOf
    Sequence

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadGenomeData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome count.
    my $genomeHash = $self->{genomes};
    my $genomeCount = (keys %{$genomeHash});
    # Create load objects for each of the tables we're loading.
    my $loadGenome = $self->_TableLoader('Genome');
    my $loadHasContig = $self->_TableLoader('HasContig', $self->PrimaryOnly);
    my $loadContig = $self->_TableLoader('Contig', $self->PrimaryOnly);
    my $loadIsMadeUpOf = $self->_TableLoader('IsMadeUpOf', $self->PrimaryOnly);
    my $loadSequence = $self->_TableLoader('Sequence', $self->PrimaryOnly);
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating genome data.") if T(2);
        # Now we loop through the genomes, generating the data for each one.
        for my $genomeID (sort keys %{$genomeHash}) {
            Trace("Generating data for genome $genomeID.") if T(3);
            $loadGenome->Add("genomeIn");
            # The access code comes in via the genome hash.
            my $accessCode = $genomeHash->{$genomeID};
            # Get the genus, species, and strain from the scientific name. Note that we append
            # the genome ID to the strain. In some cases this is the totality of the strain name.
            my ($genus, $species, @extraData) = split / /, $self->{fig}->genus_species($genomeID);
            my $extra = join " ", @extraData, "[$genomeID]";
            # Get the full taxonomy.
            my $taxonomy = $fig->taxonomy_of($genomeID);
            # Output the genome record.
            $loadGenome->Put($genomeID, $accessCode, $fig->is_complete($genomeID), $genus,
                             $species, $extra, $taxonomy);
            # Now we loop through each of the genome's contigs.
            my @contigs = $fig->all_contigs($genomeID);
            for my $contigID (@contigs) {
                Trace("Processing contig $contigID for $genomeID.") if T(4);
                $loadContig->Add("contigIn");
                $loadSequence->Add("contigIn");
                # Create the contig ID.
                my $sproutContigID = "$genomeID:$contigID";
                # Create the contig record and relate it to the genome.
                $loadContig->Put($sproutContigID);
                $loadHasContig->Put($genomeID, $sproutContigID);
                # Now we need to split the contig into sequences. The maximum sequence size is
                # a property of the Sprout object.
                my $chunkSize = $self->{sprout}->MaxSequence();
                # Now we get the sequence a chunk at a time.
                my $contigLen = $fig->contig_ln($genomeID, $contigID);
                for (my $i = 1; $i <= $contigLen; $i += $chunkSize) {
                    $loadSequence->Add("chunkIn");
                    # Compute the endpoint of this chunk.
                    my $end = FIG::min($i + $chunkSize - 1, $contigLen);
                    # Get the actual DNA.
                    my $dna = $fig->get_dna($genomeID, $contigID, $i, $end);
                    # Compute the sequenceID.
                    my $seqID = "$sproutContigID.$i";
                    # Write out the data. For now, the quality vector is always "unknown".
                    $loadIsMadeUpOf->Put($sproutContigID, $seqID, $end + 1 - $i, $i);
                    $loadSequence->Put($seqID, "unknown", $dna);
                }
            }
        }
    }
    # Finish the loads.
    my $retVal = $self->_FinishAll();
    # Return the result.
    return $retVal;
}

=head3 LoadCouplingData

C<< my $stats = $spl->LoadCouplingData(); >>

Load the coupling and evidence data from FIG into Sprout.

The coupling data specifies which genome features are functionally coupled. The
evidence data explains why the coupling is functional.

The following relations are loaded by this method.

    Coupling
    IsEvidencedBy
    PCH
    ParticipatesInCoupling
    UsesAsEvidence

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadCouplingData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeFilter = $self->{genomes};
    my $genomeCount = (keys %{$genomeFilter});
    my $featureCount = $genomeCount * 4000;
    # Start the loads.
    my $loadCoupling = $self->_TableLoader('Coupling');
    my $loadIsEvidencedBy = $self->_TableLoader('IsEvidencedBy', $self->PrimaryOnly);
    my $loadPCH = $self->_TableLoader('PCH', $self->PrimaryOnly);
    my $loadParticipatesInCoupling = $self->_TableLoader('ParticipatesInCoupling', $self->PrimaryOnly);
    my $loadUsesAsEvidence = $self->_TableLoader('UsesAsEvidence', $self->PrimaryOnly);
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating coupling data.") if T(2);
        # Loop through the genomes found.
        for my $genome (sort keys %{$genomeFilter}) {
            Trace("Generating coupling data for $genome.") if T(3);
            $loadCoupling->Add("genomeIn");
            # Create a hash table for holding coupled pairs. We use this to prevent
            # duplicates. For example, if A is coupled to B, we don't want to also
            # assert that B is coupled to A, because we already know it. Fortunately,
            # all couplings occur within a genome, so we can keep the hash table
            # size reasonably small.
            my %dupHash = ();
            # Get all of the genome's PEGs.
            my @pegs = $fig->pegs_of($genome);
            # Loop through the PEGs.
            for my $peg1 (@pegs) {
                $loadCoupling->Add("pegIn");
                Trace("Processing PEG $peg1 for $genome.") if T(4);
                # Get a list of the coupled PEGs.
                my @couplings = $fig->coupled_to($peg1);
                # For each coupled PEG, we need to verify that a coupling already
                # exists. If not, we have to create one.
                for my $coupleData (@couplings) {
                    my ($peg2, $score) = @{$coupleData};
                    # Compute the coupling ID.
                    my $coupleID = Sprout::CouplingID($peg1, $peg2);
                    if (! exists $dupHash{$coupleID}) {
                        $loadCoupling->Add("couplingIn");
                        # Here we have a new coupling to store in the load files.
                        Trace("Storing coupling ($coupleID) with score $score.") if T(4);
                        # Ensure we don't do this again.
                        $dupHash{$coupleID} = $score;
                        # Write the coupling record.
                        $loadCoupling->Put($coupleID, $score);
                        # Connect it to the coupled PEGs.
                        $loadParticipatesInCoupling->Put($peg1, $coupleID, 1);
                        $loadParticipatesInCoupling->Put($peg2, $coupleID, 2);
                        # Get the evidence for this coupling.
                        my @evidence = $fig->coupling_evidence($peg1, $peg2);
                        # Organize the evidence into a hash table.
                        my %evidenceMap = ();
                        # Process each evidence item.
                        for my $evidenceData (@evidence) {
                            $loadPCH->Add("evidenceIn");
                            my ($peg3, $peg4, $usage) = @{$evidenceData};
                            # Only proceed if the evidence is from a Sprout
                            # genome.
                            if ($genomeFilter->{$fig->genome_of($peg3)}) {
                                $loadUsesAsEvidence->Add("evidenceChosen");
                                my $evidenceKey = "$coupleID $peg3 $peg4";
                                # We store this evidence in the hash if the usage
                                # is nonzero or no prior evidence has been found. This
                                # insures that if there is duplicate evidence, we
                                # at least keep the meaningful ones. Only evidence in
                                # the hash makes it to the output.
                                if ($usage || ! exists $evidenceMap{$evidenceKey}) {
                                    $evidenceMap{$evidenceKey} = $evidenceData;
                                }
                            }
                        }
                        for my $evidenceID (keys %evidenceMap) {
                            # Create the evidence record.
                            my ($peg3, $peg4, $usage) = @{$evidenceMap{$evidenceID}};
                            $loadPCH->Put($evidenceID, $usage);
                            # Connect it to the coupling.
                            $loadIsEvidencedBy->Put($coupleID, $evidenceID);
                            # Connect it to the features.
                            $loadUsesAsEvidence->Put($evidenceID, $peg3, 1);
                            $loadUsesAsEvidence->Put($evidenceID, $peg4, 2);
                        }
                    }
                }
            }
        }
    }
    # All done. Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadFeatureData

C<< my $stats = $spl->LoadFeatureData(); >>

Load the feature data from FIG into Sprout.

Features represent annotated genes, and are therefore the heart of the data store.

The following relations are loaded by this method.

    Feature
    FeatureAlias
    FeatureLink
    FeatureTranslation
    FeatureUpstream
    IsLocatedIn

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadFeatureData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the table of genome IDs.
    my $genomeHash = $self->{genomes};
    # Create load objects for each of the tables we're loading.
    my $loadFeature = $self->_TableLoader('Feature');
    my $loadIsLocatedIn = $self->_TableLoader('IsLocatedIn', $self->PrimaryOnly);
    my $loadFeatureAlias = $self->_TableLoader('FeatureAlias');
    my $loadFeatureLink = $self->_TableLoader('FeatureLink');
    my $loadFeatureTranslation = $self->_TableLoader('FeatureTranslation');
    my $loadFeatureUpstream = $self->_TableLoader('FeatureUpstream');
    # Get the maximum sequence size. We need this later for splitting up the
    # locations.
    my $chunkSize = $self->{sprout}->MaxSegment();
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating feature data.") if T(2);
        # Now we loop through the genomes, generating the data for each one.
        for my $genomeID (sort keys %{$genomeHash}) {
            Trace("Loading features for genome $genomeID.") if T(3);
            $loadFeature->Add("genomeIn");
            # Get the feature list for this genome.
            my $features = $fig->all_features_detailed($genomeID);
            # Loop through the features.
            for my $featureData (@{$features}) {
                $loadFeature->Add("featureIn");
                # Split the tuple.
                my ($featureID, $locations, undef, $type) = @{$featureData};
                # Create the feature record.
                $loadFeature->Put($featureID, 1, $type);
                # Create the aliases.
                for my $alias ($fig->feature_aliases($featureID)) {
                    $loadFeatureAlias->Put($featureID, $alias);
                }
                # Get the links.
                my @links = $fig->fid_links($featureID);
                for my $link (@links) {
                    $loadFeatureLink->Put($featureID, $link);
                }
                # If this is a peg, generate the translation and the upstream.
                if ($type eq 'peg') {
                    $loadFeatureTranslation->Add("pegIn");
                    my $translation = $fig->get_translation($featureID);
                    if ($translation) {
                        $loadFeatureTranslation->Put($featureID, $translation);
                    }
                    # We use the default upstream values of u=200 and c=100.
                    my $upstream = $fig->upstream_of($featureID, 200, 100);
                    if ($upstream) {
                        $loadFeatureUpstream->Put($featureID, $upstream);
                    }
                }
                # This part is the roughest. We need to relate the features to contig
                # locations, and the locations must be split so that none of them exceed
                # the maximum segment size. This simplifies the genes_in_region processing
                # for Sprout.
                my @locationList = split /\s*,\s*/, $locations;
                # Create the location position indicator.
                my $i = 1;
                # Loop through the locations.
                for my $location (@locationList) {
                    # Parse the location.
                    my $locObject = BasicLocation->new("$genomeID:$location");
                    # Split it into a list of chunks.
                    my @locOList = ();
                    while (my $peeling = $locObject->Peel($chunkSize)) {
                        $loadIsLocatedIn->Add("peeling");
                        push @locOList, $peeling;
                    }
                    push @locOList, $locObject;
                    # Loop through the chunks, creating IsLocatedIn records. The variable
                    # "$i" will be used to keep the location index.
                    for my $locChunk (@locOList) {                    
                        $loadIsLocatedIn->Put($featureID, $locChunk->Contig, $locChunk->Left,
                                              $locChunk->Dir, $locChunk->Length, $i);
                        $i++;
                    }
                }
            }
        }
    }
    # Finish the loads.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadBBHData

C<< my $stats = $spl->LoadBBHData(); >>

Load the bidirectional best hit data from FIG into Sprout.

Sprout does not store information on similarities. Instead, it has only the
bi-directional best hits. Even so, the BBH table is one of the largest in
the database.

The following relations are loaded by this method.

    IsBidirectionalBestHitOf

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadBBHData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the table of genome IDs.
    my $genomeHash = $self->{genomes};
    # Create load objects for each of the tables we're loading.
    my $loadIsBidirectionalBestHitOf = $self->_TableLoader('IsBidirectionalBestHitOf');
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating BBH data.") if T(2);
        # Now we loop through the genomes, generating the data for each one.
        for my $genomeID (sort keys %{$genomeHash}) {
            $loadIsBidirectionalBestHitOf->Add("genomeIn");
            Trace("Processing features for genome $genomeID.") if T(3);
            # Get the feature list for this genome.
            my $features = $fig->all_features_detailed($genomeID);
            # Loop through the features.
            for my $featureData (@{$features}) {
                # Split the tuple.
                my ($featureID, $locations, $aliases, $type) = @{$featureData};
                # Get the bi-directional best hits.
                my @bbhList = $fig->bbhs($featureID);
                for my $bbhEntry (@bbhList) {
                    # Get the target feature ID and the score.
                    my ($targetID, $score) = @{$bbhEntry};
                    # Check the target feature's genome.
                    my $targetGenomeID = $fig->genome_of($targetID);
                    # Only proceed if it's one of our genomes.
                    if ($genomeHash->{$targetGenomeID}) {
                        $loadIsBidirectionalBestHitOf->Put($featureID, $targetID, $targetGenomeID,
                                                           $score);
                    }
                }
            }
        }
    }
    # Finish the loads.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadSubsystemData

C<< my $stats = $spl->LoadSubsystemData(); >>

Load the subsystem data from FIG into Sprout.

Subsystems are groupings of genetic roles that work together to effect a specific
chemical reaction. Similar organisms require similar subsystems. To curate a subsystem,
a spreadsheet is created with genomes on one axis and subsystem roles on the other
axis. Similar features are then mapped into the cells, allowing the annotation of one
genome's roles to be used to assist in the annotation of others.

The following relations are loaded by this method.

    Subsystem
    Role
    RoleEC
    SSCell
    ContainsFeature
    IsGenomeOf
    IsRoleOf
    OccursInSubsystem
    ParticipatesIn
    HasSSCell
    ConsistsOfRoles
    RoleSubset
    HasRoleSubset
    ConsistsOfGenomes
    GenomeSubset
    HasGenomeSubset
    Catalyzes
    Diagram
    RoleOccursIn

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadSubsystemData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash. We'll use it to filter the genomes in each
    # spreadsheet.
    my $genomeHash = $self->{genomes};
    # Get the subsystem hash. This lists the subsystems we'll process.
    my $subsysHash = $self->{subsystems};
    my @subsysIDs = sort keys %{$subsysHash};
    # Get the map list.
    my @maps = $fig->all_maps;
    # Create load objects for each of the tables we're loading.
    my $loadDiagram = $self->_TableLoader('Diagram', $self->PrimaryOnly);
    my $loadRoleOccursIn = $self->_TableLoader('RoleOccursIn', $self->PrimaryOnly);
    my $loadSubsystem = $self->_TableLoader('Subsystem');
    my $loadRole = $self->_TableLoader('Role', $self->PrimaryOnly);
    my $loadRoleEC = $self->_TableLoader('RoleEC', $self->PrimaryOnly);
    my $loadCatalyzes = $self->_TableLoader('Catalyzes', $self->PrimaryOnly);
    my $loadSSCell = $self->_TableLoader('SSCell', $self->PrimaryOnly);
    my $loadContainsFeature = $self->_TableLoader('ContainsFeature', $self->PrimaryOnly);
    my $loadIsGenomeOf = $self->_TableLoader('IsGenomeOf', $self->PrimaryOnly);
    my $loadIsRoleOf = $self->_TableLoader('IsRoleOf', $self->PrimaryOnly);
    my $loadOccursInSubsystem = $self->_TableLoader('OccursInSubsystem', $self->PrimaryOnly);
    my $loadParticipatesIn = $self->_TableLoader('ParticipatesIn', $self->PrimaryOnly);
    my $loadHasSSCell = $self->_TableLoader('HasSSCell', $self->PrimaryOnly);
    my $loadRoleSubset = $self->_TableLoader('RoleSubset', $self->PrimaryOnly);
    my $loadGenomeSubset = $self->_TableLoader('GenomeSubset', $self->PrimaryOnly);
    my $loadConsistsOfRoles = $self->_TableLoader('ConsistsOfRoles', $self->PrimaryOnly);
    my $loadConsistsOfGenomes = $self->_TableLoader('ConsistsOfGenomes', $self->PrimaryOnly);
    my $loadHasRoleSubset = $self->_TableLoader('HasRoleSubset', $self->PrimaryOnly);
    my $loadHasGenomeSubset = $self->_TableLoader('HasGenomeSubset', $self->PrimaryOnly);
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating subsystem data.") if T(2);
        # This hash will contain the role for each EC. When we're done, this
        # information will be used to generate the Catalyzes table.
        my %ecToRoles = ();
        # Loop through the subsystems. Our first task will be to create the
        # roles. We do this by looping through the subsystems and creating a
        # role hash. The hash tracks each role ID so that we don't create
        # duplicates. As we move along, we'll connect the roles and subsystems
        # and memorize up the reactions.
        my ($genomeID, $roleID);
        my %roleData = ();
        for my $subsysID (@subsysIDs) {
            Trace("Creating subsystem $subsysID.") if T(3);
            $loadSubsystem->Add("subsystemIn");
            # Get the subsystem object.
            my $sub = $fig->get_subsystem($subsysID);
            # Create the subsystem record.
            my $curator = $sub->get_curator();
            my $notes = $sub->get_notes();
            $loadSubsystem->Put($subsysID, $curator, $notes);
            # Connect it to its roles. Each role is a column in the subsystem spreadsheet.
            for (my $col = 0; defined($roleID = $sub->get_role($col)); $col++) {
                # Connect to this role.
                $loadOccursInSubsystem->Add("roleIn");
                $loadOccursInSubsystem->Put($roleID, $subsysID, $col);
                # If it's a new role, add it to the role table.
                if (! exists $roleData{$roleID}) {
                    # Get the role's abbreviation.
                    my $abbr = $sub->get_role_abbr($col);
                    # Add the role.
                    $loadRole->Put($roleID, $abbr);
                    $roleData{$roleID} = 1;
                    # Check for an EC number.
                    if ($roleID =~ /\(EC ([^.]+\.[^.]+\.[^.]+\.[^)]+)\)\s*$/) {
                        my $ec = $1;
                        $loadRoleEC->Put($roleID, $ec);
                        $ecToRoles{$ec} = $roleID;
                    }
                }
            }
            # Now we create the spreadsheet for the subsystem by matching roles to
            # genomes. Each genome is a row and each role is a column. We may need
            # to actually create the roles as we find them.
            Trace("Creating subsystem $subsysID spreadsheet.") if T(3);
            for (my $row = 0; defined($genomeID = $sub->get_genome($row)); $row++) {
                # Only proceed if this is one of our genomes.
                if (exists $genomeHash->{$genomeID}) {
                    # Count the PEGs and cells found for verification purposes.
                    my $pegCount = 0;
                    my $cellCount = 0;
                    # Create a list for the PEGs we find. This list will be used
                    # to generate cluster numbers.
                    my @pegsFound = ();
                    # Create a hash that maps spreadsheet IDs to PEGs. We will
                    # use this to generate the ContainsFeature data after we have
                    # the cluster numbers.
                    my %cellPegs = ();
                    # Get the genome's variant code for this subsystem.
                    my $variantCode = $sub->get_variant_code($row);
                    # Loop through the subsystem's roles. We use an index because it is
                    # part of the spreadsheet cell ID.
                    for (my $col = 0; defined($roleID = $sub->get_role($col)); $col++) {
                        # Get the features in the spreadsheet cell for this genome and role.
                        my @pegs = $sub->get_pegs_from_cell($row, $col);
                        # Only proceed if features exist.
                        if (@pegs > 0) {
                            # Create the spreadsheet cell.
                            $cellCount++;
                            my $cellID = "$subsysID:$genomeID:$col";
                            $loadSSCell->Put($cellID);
                            $loadIsGenomeOf->Put($genomeID, $cellID);
                            $loadIsRoleOf->Put($roleID, $cellID);
                            $loadHasSSCell->Put($subsysID, $cellID);
                            # Remember its features.
                            push @pegsFound, @pegs;
                            $cellPegs{$cellID} = \@pegs;
                            $pegCount += @pegs;
                        }
                    }
                    # If we found some cells for this genome, we need to compute clusters and
                    # denote it participates in the subsystem.
                    if ($pegCount > 0) {
                        Trace("$pegCount PEGs in $cellCount cells for $genomeID.") if T(3);
                        $loadParticipatesIn->Put($genomeID, $subsysID, $variantCode);
                        # Partition the PEGs found into clusters.
                        my @clusters = $fig->compute_clusters(\@pegsFound, $sub);
                        # Create a hash mapping PEG IDs to cluster numbers.
                        # We default to -1 for all of them.
                        my %clusterOf = map { $_ => -1 } @pegsFound;
                        for (my $i = 0; $i <= $#clusters; $i++) {
                            my $subList = $clusters[$i];
                            for my $peg (@{$subList}) {
                                $clusterOf{$peg} = $i;
                            }
                        }
                        # Create the ContainsFeature data.
                        for my $cellID (keys %cellPegs) {
                            my $cellList = $cellPegs{$cellID};
                            for my $cellPeg (@$cellList) {
                                $loadContainsFeature->Put($cellID, $cellPeg, $clusterOf{$cellPeg});
                            }
                        }
                    }
                }
            }
            # Now we need to generate the subsets. The subset names must be concatenated to
            # the subsystem name to make them unique keys. There are two types of subsets:
            # genome subsets and role subsets. We do the role subsets first.
            my @subsetNames = $sub->get_subset_names();
            for my $subsetID (@subsetNames) {
                # Create the subset record.
                my $actualID = "$subsysID:$subsetID";
                $loadRoleSubset->Put($actualID);
                # Connect the subset to the subsystem.
                $loadHasRoleSubset->Put($subsysID, $actualID);
                # Connect the subset to its roles.
                my @roles = $sub->get_subset($subsetID);
                for my $roleID (@roles) {
                    $loadConsistsOfRoles->Put($actualID, $roleID);
                }
            }
            # Next the genome subsets.
            @subsetNames = $sub->get_subset_namesR();
            for my $subsetID (@subsetNames) {
                # Create the subset record.
                my $actualID = "$subsysID:$subsetID";
                $loadGenomeSubset->Put($actualID);
                # Connect the subset to the subsystem.
                $loadHasGenomeSubset->Put($subsysID, $actualID);
                # Connect the subset to its genomes.
                my @genomes = $sub->get_subsetR($subsetID);
                for my $genomeID (@genomes) {
                    $loadConsistsOfGenomes->Put($actualID, $genomeID);
                }
            }
        }
        # Now we loop through the diagrams. We need to create the diagram records
        # and link each diagram to its roles. Note that only roles which occur
        # in subsystems (and therefore appear in the %ecToRoles hash) are
        # included.
        for my $map (@maps) {
            Trace("Loading diagram $map.") if T(3);
            # Get the diagram's descriptive name.
            my $name = $fig->map_name($map);
            $loadDiagram->Put($map, $name);
            # Now we need to link all the map's roles to it.
            # A hash is used to prevent duplicates.
            my %roleHash = ();
            for my $role ($fig->map_to_ecs($map)) {
                if (exists $ecToRoles{$role} && ! $roleHash{$role}) {
                    $loadRoleOccursIn->Put($ecToRoles{$role}, $map);
                    $roleHash{$role} = 1;
                }
            }
        }
        # Before we leave, we must create the Catalyzes table. We start with the reactions,
        # then use the "ecToRoles" table to convert EC numbers to role IDs.
        my @reactions = $fig->all_reactions();
        for my $reactionID (@reactions) {
            # Get this reaction's list of roles. The results will be EC numbers.
            my @roles = $fig->catalyzed_by($reactionID);
            # Loop through the roles, creating catalyzation records.
            for my $thisRole (@roles) {
                if (exists $ecToRoles{$thisRole}) {
                    $loadCatalyzes->Put($ecToRoles{$thisRole}, $reactionID);
                }
            }
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadPropertyData

C<< my $stats = $spl->LoadPropertyData(); >>

Load the attribute data from FIG into Sprout.

Attribute data in FIG corresponds to the Sprout concept of Property. As currently
implemented, each key-value attribute combination in the SEED corresponds to a
record in the B<Property> table. The B<HasProperty> relationship links the
features to the properties.

The SEED also allows attributes to be assigned to genomes, but this is not yet
supported by Sprout.

The following relations are loaded by this method.

    HasProperty
    Property

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadPropertyData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeHash = $self->{genomes};
    # Create load objects for each of the tables we're loading.
    my $loadProperty = $self->_TableLoader('Property');
    my $loadHasProperty = $self->_TableLoader('HasProperty', $self->PrimaryOnly);
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating property data.") if T(2);
        # Create a hash for storing property IDs.
        my %propertyKeys = ();
        my $nextID = 1;
        # Loop through the genomes.
        for my $genomeID (keys %{$genomeHash}) {
            $loadProperty->Add("genomeIn");
            Trace("Generating properties for $genomeID.") if T(3);
            # Get the genome's features. The feature ID is the first field in the
            # tuples returned by "all_features_detailed". We use "all_features_detailed"
            # rather than "all_features" because we want all features regardless of type.
            my @features = map { $_->[0] } @{$fig->all_features_detailed($genomeID)};
            my $featureCount = 0;
            my $propertyCount = 0;
            # Loop through the features, creating HasProperty records.
            for my $fid (@features) {
                # Get all attributes for this feature. We do this one feature at a time
                # to insure we do not get any genome attributes.
                my @attributeList = $fig->get_attributes($fid, '', '', '');
                if (scalar @attributeList) {
                    $featureCount++;
                }
                # Loop through the attributes.
                for my $tuple (@attributeList) {
                    $propertyCount++;
                    # Get this attribute value's data. Note that we throw away the FID,
                    # since it will always be the same as the value if "$fid".
                    my (undef, $key, $value, $url) = @{$tuple};
                    # Concatenate the key and value and check the "propertyKeys" hash to
                    # see if we already have an ID for it. We use a tab for the separator
                    # character.
                    my $propertyKey = "$key\t$value";
                    # Use the concatenated value to check for an ID. If no ID exists, we
                    # create one.
                    my $propertyID = $propertyKeys{$propertyKey};
                    if (! $propertyID) {
                        # Here we need to create a new property ID for this key/value pair.
                        $propertyKeys{$propertyKey} = $nextID;
                        $propertyID = $nextID;
                        $nextID++;
                        $loadProperty->Put($propertyID, $key, $value);
                    }
                    # Create the HasProperty entry for this feature/property association.
                    $loadHasProperty->Put($fid, $propertyID, $url);
                }
            }
            # Update the statistics.
            Trace("$propertyCount attributes processed for $featureCount features.") if T(3);
            $loadHasProperty->Add("featuresIn", $featureCount);
            $loadHasProperty->Add("propertiesIn", $propertyCount);
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadAnnotationData

C<< my $stats = $spl->LoadAnnotationData(); >>

Load the annotation data from FIG into Sprout.

Sprout annotations encompass both the assignments and the annotations in SEED.
These describe the function performed by a PEG as well as any other useful
information that may aid in identifying its purpose.

The following relations are loaded by this method.

    Annotation
    IsTargetOfAnnotation
    SproutUser
    MadeAnnotation

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadAnnotationData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeHash = $self->{genomes};
    # Create load objects for each of the tables we're loading.
    my $loadAnnotation = $self->_TableLoader('Annotation');
    my $loadIsTargetOfAnnotation = $self->_TableLoader('IsTargetOfAnnotation', $self->PrimaryOnly);
    my $loadSproutUser = $self->_TableLoader('SproutUser', $self->PrimaryOnly);
    my $loadUserAccess = $self->_TableLoader('UserAccess', $self->PrimaryOnly);
    my $loadMadeAnnotation = $self->_TableLoader('MadeAnnotation', $self->PrimaryOnly);
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating annotation data.") if T(2);
        # Create a hash of user names. We'll use this to prevent us from generating duplicate
        # user records.
        my %users = ( FIG => 1, master => 1 );
        # Put in FIG and "master".
        $loadSproutUser->Put("FIG", "Fellowship for Interpretation of Genomes");
        $loadUserAccess->Put("FIG", 1);
        $loadSproutUser->Put("master", "Master User");
        $loadUserAccess->Put("master", 1);
        # Get the current time.
        my $time = time();
        # Loop through the genomes.
        for my $genomeID (sort keys %{$genomeHash}) {
            Trace("Processing $genomeID.") if T(3);
            # Get the genome's PEGs.
            my @pegs = $fig->pegs_of($genomeID);
            for my $peg (@pegs) {
                Trace("Processing $peg.") if T(4);
                # Create a hash of timestamps. We use this to prevent duplicate time stamps
                # from showing up for a single PEG's annotations.
                my %seenTimestamps = ();
                # Loop through the annotations.
                for my $tuple ($fig->feature_annotations($peg, "raw")) {
                    my ($fid, $timestamp, $user, $text) = @{$tuple};
                    # Here we fix up the annotation text. "\r" is removed,
                    # and "\t" and "\n" are escaped. Note we use the "s"
                    # modifier so that new-lines inside the text do not
                    # stop the substitution search.
                    $text =~ s/\r//gs;
                    $text =~ s/\t/\\t/gs;
                    $text =~ s/\n/\\n/gs;
                    # Change assignments by the master user to FIG assignments.
                    $text =~ s/Set master function/Set FIG function/s;
                    # Insure the time stamp is valid.
                    if ($timestamp =~ /^\d+$/) {
                        # Here it's a number. We need to insure the one we use to form
                        # the key is unique.
                        my $keyStamp = $timestamp;
                        while ($seenTimestamps{$keyStamp}) {
                            $keyStamp++;
                        }
                        $seenTimestamps{$keyStamp} = 1;
                        my $annotationID = "$peg:$keyStamp";
                        # Insure the user exists.
                        if (! $users{$user}) {
                            $loadSproutUser->Put($user, "SEED user");
                            $loadUserAccess->Put($user, 1);
                            $users{$user} = 1;
                        }
                        # Generate the annotation.
                        $loadAnnotation->Put($annotationID, $timestamp, $text);
                        $loadIsTargetOfAnnotation->Put($peg, $annotationID);
                        $loadMadeAnnotation->Put($user, $annotationID);
                    } else {
                        # Here we have an invalid time stamp.
                        Trace("Invalid time stamp \"$timestamp\" in annotations for $peg.") if T(1);
                    }
                }
            }
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadSourceData

C<< my $stats = $spl->LoadSourceData(); >>

Load the source data from FIG into Sprout.

Source data links genomes to information about the organizations that
mapped it.

The following relations are loaded by this method.

    ComesFrom
    Source
    SourceURL

There is no direct support for source attribution in FIG, so we access the SEED
files directly.

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadSourceData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeHash = $self->{genomes};
    # Create load objects for each of the tables we're loading.
    my $loadComesFrom = $self->_TableLoader('ComesFrom', $self->PrimaryOnly);
    my $loadSource = $self->_TableLoader('Source');
    my $loadSourceURL = $self->_TableLoader('SourceURL');
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating annotation data.") if T(2);
        # Create hashes to collect the Source information.
        my %sourceURL = ();
        my %sourceDesc = ();
        # Loop through the genomes.
        my $line;
        for my $genomeID (sort keys %{$genomeHash}) {
            Trace("Processing $genomeID.") if T(3);
            # Open the project file.
            if ((open(TMP, "<$FIG_Config::organisms/$genomeID/PROJECT")) &&
                defined($line = <TMP>)) {
                chomp $line;
                my($sourceID, $desc, $url) = split(/\t/,$line);
                $loadComesFrom->Put($genomeID, $sourceID);
                if ($url && ! exists $sourceURL{$sourceID}) {
                    $loadSourceURL->Put($sourceID, $url);
                    $sourceURL{$sourceID} = 1;
                }
                if ($desc) {
                    $sourceDesc{$sourceID} = $desc;
                } elsif (! exists $sourceDesc{$sourceID}) {
                    $sourceDesc{$sourceID} = $sourceID;
                }
            }
            close TMP;
        }
        # Write the source descriptions.
        for my $sourceID (keys %sourceDesc) {
            $loadSource->Put($sourceID, $sourceDesc{$sourceID});
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadExternalData

C<< my $stats = $spl->LoadExternalData(); >>

Load the external data from FIG into Sprout.

External data contains information about external feature IDs.

The following relations are loaded by this method.

    ExternalAliasFunc
    ExternalAliasOrg

The support for external IDs in FIG is hidden beneath layers of other data, so
we access the SEED files directly to create these tables. This is also one of
the few load methods that does not proceed genome by genome.

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadExternalData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeHash = $self->{genomes};
    # Convert the genome hash. We'll get the genus and species for each genome and make
    # it the key.
    my %speciesHash = map { $fig->genus_species($_) => $_ } (keys %{$genomeHash});
    # Create load objects for each of the tables we're loading.
    my $loadExternalAliasFunc = $self->_TableLoader('ExternalAliasFunc');
    my $loadExternalAliasOrg = $self->_TableLoader('ExternalAliasOrg');
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating external data.") if T(2);
        # We loop through the files one at a time. First, the organism file.
        Open(\*ORGS, "<$FIG_Config::global/ext_org.table");
        my $orgLine;
        while (defined($orgLine = <ORGS>)) {
            # Clean the input line.
            chomp $orgLine;
            # Parse the organism name.
            my ($protID, $name) = split /\s*\t\s*/, $orgLine;
            $loadExternalAliasOrg->Put($protID, $name);
        }
        close ORGS;
        # Now the function file.
        my $funcLine;
        Open(\*FUNCS, "<$FIG_Config::global/ext_func.table");
        while (defined($funcLine = <FUNCS>)) {
            # Clean the line ending.
            chomp $funcLine;
            # Only proceed if the line is non-blank.
            if ($funcLine) {
                # Split it into fields.
                my @funcFields = split /\s*\t\s*/, $funcLine;
                # If there's an EC number, append it to the description.
                if ($#funcFields >= 2 && $funcFields[2] =~ /^(EC .*\S)/) {
                    $funcFields[1] .= " $1";
                }
                # Output the function line.
                $loadExternalAliasFunc->Put(@funcFields[0,1]);
            }
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}


=head3 LoadReactionData

C<< my $stats = $spl->LoadReactionData(); >>

Load the reaction data from FIG into Sprout.

Reaction data connects reactions to the compounds that participate in them.

The following relations are loaded by this method.

    Reaction
    ReactionURL
    Compound
    CompoundName
    CompoundCAS
    IsAComponentOf

This method proceeds reaction by reaction rather than genome by genome.

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadReactionData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Create load objects for each of the tables we're loading.
    my $loadReaction = $self->_TableLoader('Reaction');
    my $loadReactionURL = $self->_TableLoader('ReactionURL', $self->PrimaryOnly);
    my $loadCompound = $self->_TableLoader('Compound', $self->PrimaryOnly);
    my $loadCompoundName = $self->_TableLoader('CompoundName', $self->PrimaryOnly);
    my $loadCompoundCAS = $self->_TableLoader('CompoundCAS', $self->PrimaryOnly);
    my $loadIsAComponentOf = $self->_TableLoader('IsAComponentOf', $self->PrimaryOnly);
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating annotation data.") if T(2);
        # First we create the compounds.
        my @compounds = $fig->all_compounds();
        for my $cid (@compounds) {
            # Check for names.
            my @names = $fig->names_of_compound($cid);
            # Each name will be given a priority number, starting with 1.
            my $prio = 1;
            for my $name (@names) {
                $loadCompoundName->Put($cid, $name, $prio++);
            }
            # Create the main compound record. Note that the first name
            # becomes the label.
            my $label = (@names > 0 ? $names[0] : $cid);
            $loadCompound->Put($cid, $label);
            # Check for a CAS ID.
            my $cas = $fig->cas($cid);
            if ($cas) {
                $loadCompoundCAS->Put($cid, $cas);
            }
        }
        # All the compounds are set up, so we need to loop through the reactions next. First,
        # we initialize the discriminator index. This is a single integer used to insure
        # duplicate elements in a reaction are not accidentally collapsed.
        my $discrim = 0;
        my @reactions = $fig->all_reactions();
        for my $reactionID (@reactions) {
            # Create the reaction record.
            $loadReaction->Put($reactionID, $fig->reversible($reactionID));
            # Compute the reaction's URL.
            my $url = HTML::reaction_link($reactionID);
            # Put it in the ReactionURL table.
            $loadReactionURL->Put($reactionID, $url);
            # Now we need all of the reaction's compounds. We get these in two phases,
            # substrates first and then products.
            for my $product (0, 1) {
                # Get the compounds of the current type for the current reaction. FIG will
                # give us 3-tuples: [ID, stoichiometry, main-flag]. At this time we do not
                # have location data in SEED, so it defaults to the empty string.
                my @compounds = $fig->reaction2comp($reactionID, $product);
                for my $compData (@compounds) {
                    # Extract the compound data from the current tuple.
                    my ($cid, $stoich, $main) = @{$compData};
                    # Link the compound to the reaction.
                    $loadIsAComponentOf->Put($cid, $reactionID, $discrim++, "", $main,
                                             $product, $stoich);
                }
            }
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadGroupData

C<< my $stats = $spl->LoadGroupData(); >>

Load the genome Groups into Sprout.

The following relations are loaded by this method.

    GenomeGroups

There is no direct support for genome groups in FIG, so we access the SEED
files directly.

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadGroupData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeHash = $self->{genomes};
    # Create a load object for the table we're loading.
    my $loadGenomeGroups = $self->_TableLoader('GenomeGroups');
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating group data.") if T(2);
        # Loop through the genomes.
        my $line;
        for my $genomeID (keys %{$genomeHash}) {
            Trace("Processing $genomeID.") if T(3);
            # Open the NMPDR group file for this genome.
            if (open(TMP, "<$FIG_Config::organisms/$genomeID/NMPDR") &&
                defined($line = <TMP>)) {
                # Clean the line ending.
                chomp $line;
                # Add the group to the table. Note that there can only be one group
                # per genome.
                $loadGenomeGroups->Put($genomeID, $line);
            }
            close TMP;
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head2 Internal Utility Methods

=head3 TableLoader

Create an ERDBLoad object for the specified table. The object is also added to
the internal list in the C<loaders> property of this object. That enables the
L</FinishAll> method to terminate all the active loads.

This is an instance method.

=over 4

=item tableName

Name of the table (relation) being loaded.

=item ignore

TRUE if the table should be ignored entirely, else FALSE.

=item RETURN

Returns an ERDBLoad object for loading the specified table.

=back

=cut

sub _TableLoader {
    # Get the parameters.
    my ($self, $tableName, $ignore) = @_;
    # Create the load object.
    my $retVal = ERDBLoad->new($self->{erdb}, $tableName, $self->{loadDirectory}, $self->LoadOnly,
                               $ignore);
    # Cache it in the loader list.
    push @{$self->{loaders}}, $retVal;
    # Return it to the caller.
    return $retVal;
}

=head3 FinishAll

Finish all the active loads on this object.

When a load is started by L</TableLoader>, the controlling B<ERDBLoad> object is cached in
the list pointed to be the C<loaders> property of this object. This method pops the loaders
off the list and finishes them to flush out any accumulated residue.

This is an instance method.

=over 4

=item RETURN

Returns a statistics object containing the accumulated statistics for the load.

=back

=cut

sub _FinishAll {
    # Get this object instance.
    my ($self) = @_;
    # Create the statistics object.
    my $retVal = Stats->new();
    # Get the loader list.
    my $loadList = $self->{loaders};
    # Loop through the list, finishing the loads. Note that if the finish fails, we die
    # ignominiously. At some future point, we want to make the loads restartable.
    while (my $loader = pop @{$loadList}) {
        # Get the relation name.
        my $relName = $loader->RelName;
        # Check the ignore flag.
        if ($loader->Ignore) {
            Trace("Relation $relName not loaded.") if T(2);
        } else {
            # Here we really need to finish.
            Trace("Finishing $relName.") if T(2);
            my $stats = $loader->Finish();
            if ($self->{options}->{dbLoad}) {
                # Here we want to use the load file just created to load the database.
                Trace("Loading relation $relName.") if T(2);
                my $newStats = $self->{sprout}->LoadUpdate(1, [$relName]);
                # Accumulate the statistics from the DB load.
                $stats->Accumulate($newStats);
            }
            $retVal->Accumulate($stats);
            Trace("Statistics for $relName:\n" . $stats->Show()) if T(2);
        }
    }
    # Return the load statistics.
    return $retVal;
}

1;

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3