[Bio] / Sprout / SproutLoad.pm Repository:
ViewVC logotype

View of /Sprout/SproutLoad.pm

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.82 - (download) (as text) (annotate)
Tue Apr 10 06:15:35 2007 UTC (12 years, 8 months ago) by parrello
Branch: MAIN
Changes since 1.81: +17 -7 lines
Added support for new fields required by the SEED Viewer.

#!/usr/bin/perl -w

package SproutLoad;

    use strict;
    use Tracer;
    use PageBuilder;
    use ERDBLoad;
    use FIG;
    use Sprout;
    use Stats;
    use BasicLocation;
    use HTML;

=head1 Sprout Load Methods

=head2 Introduction

This object contains the methods needed to copy data from the FIG data store to the
Sprout database. It makes heavy use of the ERDBLoad object to manage the load into
individual tables. The client can create an instance of this object and then
call methods for each group of tables to load. For example, the following code will
load the Genome- and Feature-related tables. (It is presumed the first command line
parameter contains the name of a file specifying the genomes.)

    my $fig = FIG->new();
    my $sprout = SFXlate->new_sprout_only();
    my $spl = SproutLoad->new($sprout, $fig, $ARGV[0]);
    my $stats = $spl->LoadGenomeData();
    $stats->Accumulate($spl->LoadFeatureData());
    print $stats->Show();

It is worth noting that the FIG object does not need to be a real one. Any object
that implements the FIG methods for data retrieval could be used. So, for example,
this object could be used to copy data from one Sprout database to another, or
from any FIG-compliant data story implemented in the future.

To insure that this is possible, each time the FIG object is used, it will be via
a variable called C<$fig>. This makes it fairly straightforward to determine which
FIG methods are required to load the Sprout database.

This object creates the load files; however, the tables are not created until it
is time to actually do the load from the files into the target database.

=cut

#: Constructor SproutLoad->new();

=head2 Public Methods

=head3 new

C<< my $spl = SproutLoad->new($sprout, $fig, $genomeFile, $subsysFile, $options); >>

Construct a new Sprout Loader object, specifying the two participating databases and
the name of the files containing the list of genomes and subsystems to use.

=over 4

=item sprout

Sprout object representing the target database. This also specifies the directory to
be used for creating the load files.

=item fig

FIG object representing the source data store from which the data is to be taken.

=item genomeFile

Either the name of the file containing the list of genomes to load or a reference to
a hash of genome IDs to access codes. If nothing is specified, all complete genomes
will be loaded and the access code will default to 1. The genome list is presumed
to be all-inclusive. In other words, all existing data in the target database will
be deleted and replaced with the data on the specified genes. If a file is specified,
it should contain one genome ID and access code per line, tab-separated.

=item subsysFile

Either the name of the file containing the list of trusted subsystems or a reference
to a list of subsystem names. If nothing is specified, all NMPDR subsystems will be
considered trusted. (A subsystem is considered NMPDR if it has a file named C<NMPDR>
in its data directory.) Only subsystem data related to the NMPDR subsystems is loaded.

=item options

Reference to a hash of command-line options.

=back

=cut

sub new {
    # Get the parameters.
    my ($class, $sprout, $fig, $genomeFile, $subsysFile, $options) = @_;
    # Create the genome hash.
    my %genomes = ();
    # We only need it if load-only is NOT specified.
    if (! $options->{loadOnly}) {
        if (! defined($genomeFile) || $genomeFile eq '') {
            # Here we want all the complete genomes and an access code of 1.
            my @genomeList = $fig->genomes(1);
            %genomes = map { $_ => 1 } @genomeList;
        } else {
            my $type = ref $genomeFile;
            Trace("Genome file parameter type is \"$type\".") if T(3);
            if ($type eq 'HASH') {
                # Here the user specified a hash of genome IDs to access codes, which is
                # exactly what we want.
                %genomes = %{$genomeFile};
            } elsif (! $type || $type eq 'SCALAR' ) {
                # The caller specified a file, so read the genomes from the file. (Note
                # that some PERLs return an empty string rather than SCALAR.)
                my @genomeList = Tracer::GetFile($genomeFile);
                if (! @genomeList) {
                    # It's an error if the genome file is empty or not found.
                    Confess("No genomes found in file \"$genomeFile\".");
                } else {
                    # We build the genome Hash using a loop rather than "map" so that
                    # an omitted access code can be defaulted to 1.
                    for my $genomeLine (@genomeList) {
                        my ($genomeID, $accessCode) = split("\t", $genomeLine);
                        if (! defined($accessCode)) {
                            $accessCode = 1;
                        }
                        $genomes{$genomeID} = $accessCode;
                    }
                }
            } else {
                Confess("Invalid genome parameter ($type) in SproutLoad constructor.");
            }
        }
    }
    # Load the list of trusted subsystems.
    my %subsystems = ();
    # We only need it if load-only is NOT specified.
    if (! $options->{loadOnly}) {
        if (! defined $subsysFile || $subsysFile eq '') {
            # Here we want all the usable subsystems. First we get the whole list.
            my @subs = $fig->all_subsystems();
            # Loop through, checking for the NMPDR file.
            for my $sub (@subs) {
                if ($fig->nmpdr_subsystem($sub)) {
                    $subsystems{$sub} = 1;
                }
            }
        } else {
            my $type = ref $subsysFile;
            if ($type eq 'ARRAY') {
                # Here the user passed in a list of subsystems.
                %subsystems = map { $_ => 1 } @{$subsysFile};
            } elsif (! $type || $type eq 'SCALAR') {
                # Here the list of subsystems is in a file.
                if (! -e $subsysFile) {
                    # It's an error if the file does not exist.
                    Confess("Trusted subsystem file not found.");
                } else {
                    # GetFile automatically chomps end-of-line characters, so this
                    # is an easy task.
                    %subsystems = map { $_ => 1 } Tracer::GetFile($subsysFile);
                }
            } else {
                Confess("Invalid subsystem parameter in SproutLoad constructor.");
            }
        }
        # Go through the subsys hash again, creating the keyword list for each subsystem.
        for my $subsystem (keys %subsystems) {
            my $name = $subsystem;
            $name =~ s/_/ /g;
            my $classes = $fig->subsystem_classification($subsystem);
            $name .= " " . join(" ", @{$classes});
            $subsystems{$subsystem} = $name;
        }
    }
    # Get the data directory from the Sprout object.
    my ($directory) = $sprout->LoadInfo();
    # Create the Sprout load object.
    my $retVal = {
                  fig => $fig,
                  genomes => \%genomes,
                  subsystems => \%subsystems,
                  sprout => $sprout,
                  loadDirectory => $directory,
                  erdb => $sprout,
                  loaders => [],
                  options => $options
                 };
    # Bless and return it.
    bless $retVal, $class;
    return $retVal;
}

=head3 LoadOnly

C<< my $flag = $spl->LoadOnly; >>

Return TRUE if we are in load-only mode, else FALSE.

=cut

sub LoadOnly {
    my ($self) = @_;
    return $self->{options}->{loadOnly};
}

=head3 PrimaryOnly

C<< my $flag = $spl->PrimaryOnly; >>

Return TRUE if only the main entity is to be loaded, else FALSE.

=cut

sub PrimaryOnly {
    my ($self) = @_;
    return $self->{options}->{primaryOnly};
}

=head3 LoadGenomeData

C<< my $stats = $spl->LoadGenomeData(); >>

Load the Genome, Contig, and Sequence data from FIG into Sprout.

The Sequence table is the largest single relation in the Sprout database, so this
method is expected to be slow and clumsy. At some point we will need to make it
restartable, since an error 10 gigabytes through a 20-gigabyte load is bound to be
very annoying otherwise.

The following relations are loaded by this method.

    Genome
    HasContig
    Contig
    IsMadeUpOf
    Sequence

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadGenomeData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome count.
    my $genomeHash = $self->{genomes};
    my $genomeCount = (keys %{$genomeHash});
    # Create load objects for each of the tables we're loading.
    my $loadGenome = $self->_TableLoader('Genome');
    my $loadHasContig = $self->_TableLoader('HasContig', $self->PrimaryOnly);
    my $loadContig = $self->_TableLoader('Contig', $self->PrimaryOnly);
    my $loadIsMadeUpOf = $self->_TableLoader('IsMadeUpOf', $self->PrimaryOnly);
    my $loadSequence = $self->_TableLoader('Sequence', $self->PrimaryOnly);
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating genome data.") if T(2);
        # Now we loop through the genomes, generating the data for each one.
        for my $genomeID (sort keys %{$genomeHash}) {
            Trace("Generating data for genome $genomeID.") if T(3);
            $loadGenome->Add("genomeIn");
            # The access code comes in via the genome hash.
            my $accessCode = $genomeHash->{$genomeID};
            # Get the genus, species, and strain from the scientific name.
            my ($genus, $species, @extraData) = split / /, $self->{fig}->genus_species($genomeID);
            my $extra = join " ", @extraData;
            # Get the full taxonomy.
            my $taxonomy = $fig->taxonomy_of($genomeID);
            # Get the version. If no version is specified, we default to the genome ID by itself.
            my $version = $fig->genome_version($genomeID);
            if (! defined($version)) {
                $version = $genomeID;
            }
            # Get the DNA size.
            my $dnaSize = $fig->genome_szdna($genomeID);
            # Open the NMPDR group file for this genome.
            my $group;
            if (open(TMP, "<$FIG_Config::organisms/$genomeID/NMPDR") &&
                defined($group = <TMP>)) {
                # Clean the line ending.
                chomp $group;
            } else {
                # No group, so use the default.
                $group = $FIG_Config::otherGroup;
            }
            close TMP;
            # Output the genome record.
            $loadGenome->Put($genomeID, $accessCode, $fig->is_complete($genomeID),
                             $dnaSize, $genus, $group, $species, $extra, $version, $taxonomy);
            # Now we loop through each of the genome's contigs.
            my @contigs = $fig->all_contigs($genomeID);
            for my $contigID (@contigs) {
                Trace("Processing contig $contigID for $genomeID.") if T(4);
                $loadContig->Add("contigIn");
                $loadSequence->Add("contigIn");
                # Create the contig ID.
                my $sproutContigID = "$genomeID:$contigID";
                # Create the contig record and relate it to the genome.
                $loadContig->Put($sproutContigID);
                $loadHasContig->Put($genomeID, $sproutContigID);
                # Now we need to split the contig into sequences. The maximum sequence size is
                # a property of the Sprout object.
                my $chunkSize = $self->{sprout}->MaxSequence();
                # Now we get the sequence a chunk at a time.
                my $contigLen = $fig->contig_ln($genomeID, $contigID);
                for (my $i = 1; $i <= $contigLen; $i += $chunkSize) {
                    $loadSequence->Add("chunkIn");
                    # Compute the endpoint of this chunk.
                    my $end = FIG::min($i + $chunkSize - 1, $contigLen);
                    # Get the actual DNA.
                    my $dna = $fig->get_dna($genomeID, $contigID, $i, $end);
                    # Compute the sequenceID.
                    my $seqID = "$sproutContigID.$i";
                    # Write out the data. For now, the quality vector is always "unknown".
                    $loadIsMadeUpOf->Put($sproutContigID, $seqID, $end + 1 - $i, $i);
                    $loadSequence->Put($seqID, "unknown", $dna);
                }
            }
        }
    }
    # Finish the loads.
    my $retVal = $self->_FinishAll();
    # Return the result.
    return $retVal;
}

=head3 LoadCouplingData

C<< my $stats = $spl->LoadCouplingData(); >>

Load the coupling and evidence data from FIG into Sprout.

The coupling data specifies which genome features are functionally coupled. The
evidence data explains why the coupling is functional.

The following relations are loaded by this method.

    Coupling
    IsEvidencedBy
    PCH
    ParticipatesInCoupling
    UsesAsEvidence

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadCouplingData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeFilter = $self->{genomes};
    # Set up an ID counter for the PCHs.
    my $pchID = 0;
    # Start the loads.
    my $loadCoupling = $self->_TableLoader('Coupling');
    my $loadIsEvidencedBy = $self->_TableLoader('IsEvidencedBy', $self->PrimaryOnly);
    my $loadPCH = $self->_TableLoader('PCH', $self->PrimaryOnly);
    my $loadParticipatesInCoupling = $self->_TableLoader('ParticipatesInCoupling', $self->PrimaryOnly);
    my $loadUsesAsEvidence = $self->_TableLoader('UsesAsEvidence', $self->PrimaryOnly);
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating coupling data.") if T(2);
        # Loop through the genomes found.
        for my $genome (sort keys %{$genomeFilter}) {
            Trace("Generating coupling data for $genome.") if T(3);
            $loadCoupling->Add("genomeIn");
            # Create a hash table for holding coupled pairs. We use this to prevent
            # duplicates. For example, if A is coupled to B, we don't want to also
            # assert that B is coupled to A, because we already know it. Fortunately,
            # all couplings occur within a genome, so we can keep the hash table
            # size reasonably small.
            my %dupHash = ();
            # Get all of the genome's PEGs.
            my @pegs = $fig->pegs_of($genome);
            # Loop through the PEGs.
            for my $peg1 (@pegs) {
                $loadCoupling->Add("pegIn");
                Trace("Processing PEG $peg1 for $genome.") if T(4);
                # Get a list of the coupled PEGs.
                my @couplings = $fig->coupled_to($peg1);
                # For each coupled PEG, we need to verify that a coupling already
                # exists. If not, we have to create one.
                for my $coupleData (@couplings) {
                    my ($peg2, $score) = @{$coupleData};
                    # Compute the coupling ID.
                    my $coupleID = $self->{erdb}->CouplingID($peg1, $peg2);
                    if (! exists $dupHash{$coupleID}) {
                        $loadCoupling->Add("couplingIn");
                        # Here we have a new coupling to store in the load files.
                        Trace("Storing coupling ($coupleID) with score $score.") if T(4);
                        # Ensure we don't do this again.
                        $dupHash{$coupleID} = $score;
                        # Write the coupling record.
                        $loadCoupling->Put($coupleID, $score);
                        # Connect it to the coupled PEGs.
                        $loadParticipatesInCoupling->Put($peg1, $coupleID, 1);
                        $loadParticipatesInCoupling->Put($peg2, $coupleID, 2);
                        # Get the evidence for this coupling.
                        my @evidence = $fig->coupling_evidence($peg1, $peg2);
                        # Organize the evidence into a hash table.
                        my %evidenceMap = ();
                        # Process each evidence item.
                        for my $evidenceData (@evidence) {
                            $loadPCH->Add("evidenceIn");
                            my ($peg3, $peg4, $usage) = @{$evidenceData};
                            # Only proceed if the evidence is from a Sprout
                            # genome.
                            if ($genomeFilter->{$fig->genome_of($peg3)}) {
                                $loadUsesAsEvidence->Add("evidenceChosen");
                                my $evidenceKey = "$coupleID $peg3 $peg4";
                                # We store this evidence in the hash if the usage
                                # is nonzero or no prior evidence has been found. This
                                # insures that if there is duplicate evidence, we
                                # at least keep the meaningful ones. Only evidence in
                                # the hash makes it to the output.
                                if ($usage || ! exists $evidenceMap{$evidenceKey}) {
                                    $evidenceMap{$evidenceKey} = $evidenceData;
                                }
                            }
                        }
                        for my $evidenceID (keys %evidenceMap) {
                            # Get the ID for this evidence.
                            $pchID++;
                            # Create the evidence record.
                            my ($peg3, $peg4, $usage) = @{$evidenceMap{$evidenceID}};
                            $loadPCH->Put($pchID, $usage);
                            # Connect it to the coupling.
                            $loadIsEvidencedBy->Put($coupleID, $pchID);
                            # Connect it to the features.
                            $loadUsesAsEvidence->Put($pchID, $peg3, 1);
                            $loadUsesAsEvidence->Put($pchID, $peg4, 2);
                        }
                    }
                }
            }
        }
    }
    # All done. Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadFeatureData

C<< my $stats = $spl->LoadFeatureData(); >>

Load the feature data from FIG into Sprout.

Features represent annotated genes, and are therefore the heart of the data store.

The following relations are loaded by this method.

    Feature
    FeatureAlias
    FeatureLink
    FeatureTranslation
    FeatureUpstream
    IsLocatedIn
    HasFeature
    HasRoleInSubsystem
    FeatureEssential
    FeatureVirulent
    FeatureIEDB

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadFeatureData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG and Sprout objects.
    my $fig = $self->{fig};
    my $sprout = $self->{sprout};
    # Get the table of genome IDs.
    my $genomeHash = $self->{genomes};
    # Create load objects for each of the tables we're loading.
    my $loadFeature = $self->_TableLoader('Feature');
    my $loadIsLocatedIn = $self->_TableLoader('IsLocatedIn', $self->PrimaryOnly);
    my $loadFeatureAlias = $self->_TableLoader('FeatureAlias');
    my $loadFeatureLink = $self->_TableLoader('FeatureLink');
    my $loadFeatureTranslation = $self->_TableLoader('FeatureTranslation');
    my $loadFeatureUpstream = $self->_TableLoader('FeatureUpstream');
    my $loadHasFeature = $self->_TableLoader('HasFeature', $self->PrimaryOnly);
    my $loadHasRoleInSubsystem = $self->_TableLoader('HasRoleInSubsystem', $self->PrimaryOnly);
    my $loadFeatureEssential = $self->_TableLoader('FeatureEssential');
    my $loadFeatureVirulent = $self->_TableLoader('FeatureVirulent');
    my $loadFeatureIEDB = $self->_TableLoader('FeatureIEDB');
    # Get the subsystem hash.
    my $subHash = $self->{subsystems};
    # Get the maximum sequence size. We need this later for splitting up the
    # locations.
    my $chunkSize = $self->{sprout}->MaxSegment();
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating feature data.") if T(2);
        # Now we loop through the genomes, generating the data for each one.
        for my $genomeID (sort keys %{$genomeHash}) {
            Trace("Loading features for genome $genomeID.") if T(3);
            $loadFeature->Add("genomeIn");
            # Get the feature list for this genome.
            my $features = $fig->all_features_detailed_fast($genomeID);
            # Sort and count the list.
            my @featureTuples = sort { $a->[0] cmp $b->[0] } @{$features};
            my $count = scalar @featureTuples;
            my @fids = map { $_->[0] } @featureTuples;
            Trace("$count features found for genome $genomeID.") if T(3);
            # Get the attributes for this genome and put them in a hash by feature ID.
            my $attributes = GetGenomeAttributes($fig, $genomeID, \@fids);
            # Set up for our duplicate-feature check.
            my $oldFeatureID = "";
            # Loop through the features.
            for my $featureTuple (@featureTuples) {
                # Split the tuple.
                my ($featureID, $locations, undef, $type, $minloc, $maxloc, $assignment, $user, $quality) = @{$featureTuple};
                # Check for duplicates.
                if ($featureID eq $oldFeatureID) {
                    Trace("Duplicate feature $featureID found.") if T(1);
                } else {
                    $oldFeatureID = $featureID;
                    # Count this feature.
                    $loadFeature->Add("featureIn");
                    # Fix the quality. It is almost always a space, but some odd stuff might sneak through, and the
                    # Sprout database requires a single character.
                    if (! defined($quality) || $quality eq "") {
                        $quality = " ";
                    }
                    # Begin building the keywords. We start with the genome ID, the
                    # feature ID, the taxonomy, and the organism name.
                    my @keywords = ($genomeID, $featureID, $fig->genus_species($genomeID),
                                    $fig->taxonomy_of($genomeID));
                    # Create the aliases.
                    for my $alias ($fig->feature_aliases($featureID)) {
                        $loadFeatureAlias->Put($featureID, $alias);
                        push @keywords, $alias;
                    }
                    Trace("Assignment for $featureID is: $assignment") if T(4);
                    # Break the assignment into words and shove it onto the
                    # keyword list.
                    push @keywords, split(/\s+/, $assignment);
                    # Link this feature to the parent genome.
                    $loadHasFeature->Put($genomeID, $featureID, $type);
                    # Get the links.
                    my @links = $fig->fid_links($featureID);
                    for my $link (@links) {
                        $loadFeatureLink->Put($featureID, $link);
                    }
                    # If this is a peg, generate the translation and the upstream.
                    if ($type eq 'peg') {
                        $loadFeatureTranslation->Add("pegIn");
                        my $translation = $fig->get_translation($featureID);
                        if ($translation) {
                            $loadFeatureTranslation->Put($featureID, $translation);
                        }
                        # We use the default upstream values of u=200 and c=100.
                        my $upstream = $fig->upstream_of($featureID, 200, 100);
                        if ($upstream) {
                            $loadFeatureUpstream->Put($featureID, $upstream);
                        }
                    }
                    # Now we need to find the subsystems this feature participates in.
                    # We also add the subsystems to the keyword list. Before we do that,
                    # we must convert underscores to spaces and tack on the classifications.
                    my @subsystems = $fig->peg_to_subsystems($featureID);
                    for my $subsystem (@subsystems) {
                        # Only proceed if we like this subsystem.
                        if (exists $subHash->{$subsystem}) {
                            # Store the has-role link.
                            $loadHasRoleInSubsystem->Put($featureID, $subsystem, $genomeID, $type);
                            # Save the subsystem's keyword data.
                            my $subKeywords = $subHash->{$subsystem};
                            push @keywords, split /\s+/, $subKeywords;
                            # Now we need to get this feature's role in the subsystem.
                            my $subObject = $fig->get_subsystem($subsystem);
                            my @roleColumns = $subObject->get_peg_roles($featureID);
                            my @allRoles = $subObject->get_roles();
                            for my $col (@roleColumns) {
                                my $role = $allRoles[$col];
                                push @keywords, split /\s+/, $role;
                                push @keywords, $subObject->get_role_abbr($col);
                            }
                        }
                    }
                    # There are three special attributes computed from property
                    # data that we build next. If the special attribute is non-empty,
                    # its name will be added to the keyword list. First, we get all
                    # the attributes for this feature. They will come back as
                    # 4-tuples: [peg, name, value, URL]. We use a 3-tuple instead:
                    # [name, value, value with URL]. (We don't need the PEG, since
                    # we already know it.)
                    my @attributes = map { [$_->[1], $_->[2], Tracer::CombineURL($_->[2], $_->[3])] }
                                         @{$attributes->{$featureID}};
                    # Now we process each of the special attributes.
                    if (SpecialAttribute($featureID, \@attributes,
                                         1, [0,2], '^(essential|potential_essential)$',
                                         $loadFeatureEssential)) {
                        push @keywords, 'essential';
                        $loadFeature->Add('essential');
                    }
                    if (SpecialAttribute($featureID, \@attributes,
                                         0, [2], '^virulen',
                                         $loadFeatureVirulent)) {
                        push @keywords, 'virulent';
                        $loadFeature->Add('virulent');
                    }
                    if (SpecialAttribute($featureID, \@attributes,
                                         0, [0,2], '^iedb_',
                                         $loadFeatureIEDB)) {
                        push @keywords, 'iedb';
                        $loadFeature->Add('iedb');
                    }
                    # Now we need to bust up hyphenated words in the keyword
                    # list. We keep them separate and put them at the end so
                    # the original word order is available.
                    my $keywordString = "";
                    my $bustedString = "";
                    for my $keyword (@keywords) {
                        if (length $keyword >= 3) {
                            $keywordString .= " $keyword";
                            if ($keyword =~ /-/) {
                                my @words = split /-/, $keyword;
                                $bustedString .= join(" ", "", @words);
                            }
                        }
                    }
                    $keywordString .= $bustedString;
                    # Get rid of annoying punctuation.
                    $keywordString =~ s/[();]//g;
                    # Clean the keyword list.
                    my $cleanWords = $sprout->CleanKeywords($keywordString);
                    Trace("Keyword string for $featureID: $cleanWords") if T(4);
                    # Create the feature record.
                    $loadFeature->Put($featureID, 1, $user, $quality, $type, $assignment, $cleanWords);
                    # This part is the roughest. We need to relate the features to contig
                    # locations, and the locations must be split so that none of them exceed
                    # the maximum segment size. This simplifies the genes_in_region processing
                    # for Sprout.
                    my @locationList = split /\s*,\s*/, $locations;
                    # Create the location position indicator.
                    my $i = 1;
                    # Loop through the locations.
                    for my $location (@locationList) {
                        # Parse the location.
                        my $locObject = BasicLocation->new("$genomeID:$location");
                        # Split it into a list of chunks.
                        my @locOList = ();
                        while (my $peeling = $locObject->Peel($chunkSize)) {
                            $loadIsLocatedIn->Add("peeling");
                            push @locOList, $peeling;
                        }
                        push @locOList, $locObject;
                        # Loop through the chunks, creating IsLocatedIn records. The variable
                        # "$i" will be used to keep the location index.
                        for my $locChunk (@locOList) {                    
                            $loadIsLocatedIn->Put($featureID, $locChunk->Contig, $locChunk->Left,
                                                  $locChunk->Dir, $locChunk->Length, $i);
                            $i++;
                        }
                    }
                }
            }
        }
    }
    # Finish the loads.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadSubsystemData

C<< my $stats = $spl->LoadSubsystemData(); >>

Load the subsystem data from FIG into Sprout.

Subsystems are groupings of genetic roles that work together to effect a specific
chemical reaction. Similar organisms require similar subsystems. To curate a subsystem,
a spreadsheet is created with genomes on one axis and subsystem roles on the other
axis. Similar features are then mapped into the cells, allowing the annotation of one
genome's roles to be used to assist in the annotation of others.

The following relations are loaded by this method.

    Subsystem
    SubsystemClass
    Role
    RoleEC
    SSCell
    ContainsFeature
    IsGenomeOf
    IsRoleOf
    OccursInSubsystem
    ParticipatesIn
    HasSSCell
    ConsistsOfRoles
    RoleSubset
    HasRoleSubset
    ConsistsOfGenomes
    GenomeSubset
    HasGenomeSubset
    Catalyzes
    Diagram
    RoleOccursIn

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadSubsystemData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash. We'll use it to filter the genomes in each
    # spreadsheet.
    my $genomeHash = $self->{genomes};
    # Get the subsystem hash. This lists the subsystems we'll process.
    my $subsysHash = $self->{subsystems};
    my @subsysIDs = sort keys %{$subsysHash};
    # Get the map list.
    my @maps = $fig->all_maps;
    # Create load objects for each of the tables we're loading.
    my $loadDiagram = $self->_TableLoader('Diagram', $self->PrimaryOnly);
    my $loadRoleOccursIn = $self->_TableLoader('RoleOccursIn', $self->PrimaryOnly);
    my $loadSubsystem = $self->_TableLoader('Subsystem');
    my $loadRole = $self->_TableLoader('Role', $self->PrimaryOnly);
    my $loadRoleEC = $self->_TableLoader('RoleEC', $self->PrimaryOnly);
    my $loadCatalyzes = $self->_TableLoader('Catalyzes', $self->PrimaryOnly);
    my $loadSSCell = $self->_TableLoader('SSCell', $self->PrimaryOnly);
    my $loadContainsFeature = $self->_TableLoader('ContainsFeature', $self->PrimaryOnly);
    my $loadIsGenomeOf = $self->_TableLoader('IsGenomeOf', $self->PrimaryOnly);
    my $loadIsRoleOf = $self->_TableLoader('IsRoleOf', $self->PrimaryOnly);
    my $loadOccursInSubsystem = $self->_TableLoader('OccursInSubsystem', $self->PrimaryOnly);
    my $loadParticipatesIn = $self->_TableLoader('ParticipatesIn', $self->PrimaryOnly);
    my $loadHasSSCell = $self->_TableLoader('HasSSCell', $self->PrimaryOnly);
    my $loadRoleSubset = $self->_TableLoader('RoleSubset', $self->PrimaryOnly);
    my $loadGenomeSubset = $self->_TableLoader('GenomeSubset', $self->PrimaryOnly);
    my $loadConsistsOfRoles = $self->_TableLoader('ConsistsOfRoles', $self->PrimaryOnly);
    my $loadConsistsOfGenomes = $self->_TableLoader('ConsistsOfGenomes', $self->PrimaryOnly);
    my $loadHasRoleSubset = $self->_TableLoader('HasRoleSubset', $self->PrimaryOnly);
    my $loadHasGenomeSubset = $self->_TableLoader('HasGenomeSubset', $self->PrimaryOnly);
    my $loadSubsystemClass = $self->_TableLoader('SubsystemClass', $self->PrimaryOnly);
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating subsystem data.") if T(2);
        # This hash will contain the role for each EC. When we're done, this
        # information will be used to generate the Catalyzes table.
        my %ecToRoles = ();
        # Loop through the subsystems. Our first task will be to create the
        # roles. We do this by looping through the subsystems and creating a
        # role hash. The hash tracks each role ID so that we don't create
        # duplicates. As we move along, we'll connect the roles and subsystems
        # and memorize up the reactions.
        my ($genomeID, $roleID);
        my %roleData = ();
        for my $subsysID (@subsysIDs) {
            # Get the subsystem object.
            my $sub = $fig->get_subsystem($subsysID);
            # Only proceed if the subsystem has a spreadsheet.
            if (! $sub->{empty_ss}) {
                Trace("Creating subsystem $subsysID.") if T(3);
                $loadSubsystem->Add("subsystemIn");
                # Create the subsystem record.
                my $curator = $sub->get_curator();
                my $notes = $sub->get_notes();
                $loadSubsystem->Put($subsysID, $curator, $notes);
                # Now for the classification string. This comes back as a list
                # reference and we convert it to a space-delimited string.
                my $classList = $fig->subsystem_classification($subsysID);
                my $classString = join($FIG_Config::splitter, grep { $_ } @$classList);
                $loadSubsystemClass->Put($subsysID, $classString);
                # Connect it to its roles. Each role is a column in the subsystem spreadsheet.
                for (my $col = 0; defined($roleID = $sub->get_role($col)); $col++) {
                    # Connect to this role.
                    $loadOccursInSubsystem->Add("roleIn");
                    $loadOccursInSubsystem->Put($roleID, $subsysID, $col);
                    # If it's a new role, add it to the role table.
                    if (! exists $roleData{$roleID}) {
                        # Get the role's abbreviation.
                        my $abbr = $sub->get_role_abbr($col);
                        # Add the role.
                        $loadRole->Put($roleID, $abbr);
                        $roleData{$roleID} = 1;
                        # Check for an EC number.
                        if ($roleID =~ /\(EC ([^.]+\.[^.]+\.[^.]+\.[^)]+)\)\s*$/) {
                            my $ec = $1;
                            $loadRoleEC->Put($roleID, $ec);
                            $ecToRoles{$ec} = $roleID;
                        }
                    }
                }
                # Now we create the spreadsheet for the subsystem by matching roles to
                # genomes. Each genome is a row and each role is a column. We may need
                # to actually create the roles as we find them.
                Trace("Creating subsystem $subsysID spreadsheet.") if T(3);
                for (my $row = 0; defined($genomeID = $sub->get_genome($row)); $row++) {
                    # Only proceed if this is one of our genomes.
                    if (exists $genomeHash->{$genomeID}) {
                        # Count the PEGs and cells found for verification purposes.
                        my $pegCount = 0;
                        my $cellCount = 0;
                        # Create a list for the PEGs we find. This list will be used
                        # to generate cluster numbers.
                        my @pegsFound = ();
                        # Create a hash that maps spreadsheet IDs to PEGs. We will
                        # use this to generate the ContainsFeature data after we have
                        # the cluster numbers.
                        my %cellPegs = ();
                        # Get the genome's variant code for this subsystem.
                        my $variantCode = $sub->get_variant_code($row);
                        # Loop through the subsystem's roles. We use an index because it is
                        # part of the spreadsheet cell ID.
                        for (my $col = 0; defined($roleID = $sub->get_role($col)); $col++) {
                            # Get the features in the spreadsheet cell for this genome and role.
                            my @pegs = grep { !$fig->is_deleted_fid($_) } $sub->get_pegs_from_cell($row, $col);
                            # Only proceed if features exist.
                            if (@pegs > 0) {
                                # Create the spreadsheet cell.
                                $cellCount++;
                                my $cellID = "$subsysID:$genomeID:$col";
                                $loadSSCell->Put($cellID);
                                $loadIsGenomeOf->Put($genomeID, $cellID);
                                $loadIsRoleOf->Put($roleID, $cellID);
                                $loadHasSSCell->Put($subsysID, $cellID);
                                # Remember its features.
                                push @pegsFound, @pegs;
                                $cellPegs{$cellID} = \@pegs;
                                $pegCount += @pegs;
                            }
                        }
                        # If we found some cells for this genome, we need to compute clusters and
                        # denote it participates in the subsystem.
                        if ($pegCount > 0) {
                            Trace("$pegCount PEGs in $cellCount cells for $genomeID.") if T(3);
                            $loadParticipatesIn->Put($genomeID, $subsysID, $variantCode);
                            # Create a hash mapping PEG IDs to cluster numbers.
                            # We default to -1 for all of them.
                            my %clusterOf = map { $_ => -1 } @pegsFound;
                            # Partition the PEGs found into clusters.
                            my @clusters = $fig->compute_clusters([keys %clusterOf], $sub);
                            for (my $i = 0; $i <= $#clusters; $i++) {
                                my $subList = $clusters[$i];
                                for my $peg (@{$subList}) {
                                    $clusterOf{$peg} = $i;
                                }
                            }
                            # Create the ContainsFeature data.
                            for my $cellID (keys %cellPegs) {
                                my $cellList = $cellPegs{$cellID};
                                for my $cellPeg (@$cellList) {
                                    $loadContainsFeature->Put($cellID, $cellPeg, $clusterOf{$cellPeg});
                                }
                            }
                        }
                    }
                }
                # Now we need to generate the subsets. The subset names must be concatenated to
                # the subsystem name to make them unique keys. There are two types of subsets:
                # genome subsets and role subsets. We do the role subsets first.
                my @subsetNames = $sub->get_subset_names();
                for my $subsetID (@subsetNames) {
                    # Create the subset record.
                    my $actualID = "$subsysID:$subsetID";
                    $loadRoleSubset->Put($actualID);
                    # Connect the subset to the subsystem.
                    $loadHasRoleSubset->Put($subsysID, $actualID);
                    # Connect the subset to its roles.
                    my @roles = $sub->get_subsetC_roles($subsetID);
                    for my $roleID (@roles) {
                        $loadConsistsOfRoles->Put($actualID, $roleID);
                    }
                }
                # Next the genome subsets.
                @subsetNames = $sub->get_subset_namesR();
                for my $subsetID (@subsetNames) {
                    # Create the subset record.
                    my $actualID = "$subsysID:$subsetID";
                    $loadGenomeSubset->Put($actualID);
                    # Connect the subset to the subsystem.
                    $loadHasGenomeSubset->Put($subsysID, $actualID);
                    # Connect the subset to its genomes.
                    my @genomes = $sub->get_subsetR($subsetID);
                    for my $genomeID (@genomes) {
                        $loadConsistsOfGenomes->Put($actualID, $genomeID);
                    }
                }
            }
        }
        # Now we loop through the diagrams. We need to create the diagram records
        # and link each diagram to its roles. Note that only roles which occur
        # in subsystems (and therefore appear in the %ecToRoles hash) are
        # included.
        for my $map (@maps) {
            Trace("Loading diagram $map.") if T(3);
            # Get the diagram's descriptive name.
            my $name = $fig->map_name($map);
            $loadDiagram->Put($map, $name);
            # Now we need to link all the map's roles to it.
            # A hash is used to prevent duplicates.
            my %roleHash = ();
            for my $role ($fig->map_to_ecs($map)) {
                if (exists $ecToRoles{$role} && ! $roleHash{$role}) {
                    $loadRoleOccursIn->Put($ecToRoles{$role}, $map);
                    $roleHash{$role} = 1;
                }
            }
        }
        # Before we leave, we must create the Catalyzes table. We start with the reactions,
        # then use the "ecToRoles" table to convert EC numbers to role IDs.
        my @reactions = $fig->all_reactions();
        for my $reactionID (@reactions) {
            # Get this reaction's list of roles. The results will be EC numbers.
            my @roles = $fig->catalyzed_by($reactionID);
            # Loop through the roles, creating catalyzation records.
            for my $thisRole (@roles) {
                if (exists $ecToRoles{$thisRole}) {
                    $loadCatalyzes->Put($ecToRoles{$thisRole}, $reactionID);
                }
            }
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadPropertyData

C<< my $stats = $spl->LoadPropertyData(); >>

Load the attribute data from FIG into Sprout.

Attribute data in FIG corresponds to the Sprout concept of Property. As currently
implemented, each key-value attribute combination in the SEED corresponds to a
record in the B<Property> table. The B<HasProperty> relationship links the
features to the properties.

The SEED also allows attributes to be assigned to genomes, but this is not yet
supported by Sprout.

The following relations are loaded by this method.

    HasProperty
    Property

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadPropertyData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeHash = $self->{genomes};
    # Create load objects for each of the tables we're loading.
    my $loadProperty = $self->_TableLoader('Property');
    my $loadHasProperty = $self->_TableLoader('HasProperty', $self->PrimaryOnly);
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating property data.") if T(2);
        # Create a hash for storing property IDs.
        my %propertyKeys = ();
        my $nextID = 1;
        # Loop through the genomes.
        for my $genomeID (sort keys %{$genomeHash}) {
            $loadProperty->Add("genomeIn");
            Trace("Generating properties for $genomeID.") if T(3);
            # Get the genome's features. The feature ID is the first field in the
            # tuples returned by "all_features_detailed". We use "all_features_detailed"
            # rather than "all_features" because we want all features regardless of type.
            my @features = map { $_->[0] } @{$fig->all_features_detailed($genomeID)};
            my $featureCount = 0;
            my $propertyCount = 0;
            # Get the properties for this genome's features.
            my $attributes = GetGenomeAttributes($fig, $genomeID, \@features);
            Trace("Property hash built for $genomeID.") if T(3);
            # Loop through the features, creating HasProperty records.
            for my $fid (@features) {
                # Get all attributes for this feature. We do this one feature at a time
                # to insure we do not get any genome attributes.
                my @attributeList = @{$attributes->{$fid}};
                if (scalar @attributeList) {
                    $featureCount++;
                }
                # Loop through the attributes.
                for my $tuple (@attributeList) {
                    $propertyCount++;
                    # Get this attribute value's data. Note that we throw away the FID,
                    # since it will always be the same as the value if "$fid".
                    my (undef, $key, $value, $url) = @{$tuple};
                    # Concatenate the key and value and check the "propertyKeys" hash to
                    # see if we already have an ID for it. We use a tab for the separator
                    # character.
                    my $propertyKey = "$key\t$value";
                    # Use the concatenated value to check for an ID. If no ID exists, we
                    # create one.
                    my $propertyID = $propertyKeys{$propertyKey};
                    if (! $propertyID) {
                        # Here we need to create a new property ID for this key/value pair.
                        $propertyKeys{$propertyKey} = $nextID;
                        $propertyID = $nextID;
                        $nextID++;
                        $loadProperty->Put($propertyID, $key, $value);
                    }
                    # Create the HasProperty entry for this feature/property association.
                    $loadHasProperty->Put($fid, $propertyID, $url);
                }
            }
            # Update the statistics.
            Trace("$propertyCount attributes processed for $featureCount features.") if T(3);
            $loadHasProperty->Add("featuresIn", $featureCount);
            $loadHasProperty->Add("propertiesIn", $propertyCount);
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadAnnotationData

C<< my $stats = $spl->LoadAnnotationData(); >>

Load the annotation data from FIG into Sprout.

Sprout annotations encompass both the assignments and the annotations in SEED.
These describe the function performed by a PEG as well as any other useful
information that may aid in identifying its purpose.

The following relations are loaded by this method.

    Annotation
    IsTargetOfAnnotation
    SproutUser
    MadeAnnotation

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadAnnotationData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeHash = $self->{genomes};
    # Create load objects for each of the tables we're loading.
    my $loadAnnotation = $self->_TableLoader('Annotation');
    my $loadIsTargetOfAnnotation = $self->_TableLoader('IsTargetOfAnnotation', $self->PrimaryOnly);
    my $loadSproutUser = $self->_TableLoader('SproutUser', $self->PrimaryOnly);
    my $loadUserAccess = $self->_TableLoader('UserAccess', $self->PrimaryOnly);
    my $loadMadeAnnotation = $self->_TableLoader('MadeAnnotation', $self->PrimaryOnly);
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating annotation data.") if T(2);
        # Create a hash of user names. We'll use this to prevent us from generating duplicate
        # user records.
        my %users = ( FIG => 1, master => 1 );
        # Put in FIG and "master".
        $loadSproutUser->Put("FIG", "Fellowship for Interpretation of Genomes");
        $loadUserAccess->Put("FIG", 1);
        $loadSproutUser->Put("master", "Master User");
        $loadUserAccess->Put("master", 1);
        # Get the current time.
        my $time = time();
        # Loop through the genomes.
        for my $genomeID (sort keys %{$genomeHash}) {
            Trace("Processing $genomeID.") if T(3);
            # Create a hash of timestamps. We use this to prevent duplicate time stamps
            # from showing up for a single PEG's annotations.
            my %seenTimestamps = ();
            # Get the genome's annotations.
            my @annotations = $fig->read_all_annotations($genomeID);
            Trace("Processing annotations.") if T(2);
            for my $tuple (@annotations) {
                # Get the annotation tuple.
                my ($peg, $timestamp, $user, $text) = @{$tuple};
                # Here we fix up the annotation text. "\r" is removed,
                # and "\t" and "\n" are escaped. Note we use the "gs"
                # modifier so that new-lines inside the text do not
                # stop the substitution search.
                $text =~ s/\r//gs;
                $text =~ s/\t/\\t/gs;
                $text =~ s/\n/\\n/gs;
                # Change assignments by the master user to FIG assignments.
                $text =~ s/Set master function/Set FIG function/s;
                # Insure the time stamp is valid.
                if ($timestamp =~ /^\d+$/) {
                    # Here it's a number. We need to insure the one we use to form
                    # the key is unique.
                    my $keyStamp = $timestamp;
                    while ($seenTimestamps{"$peg:$keyStamp"}) {
                        $keyStamp++;
                    }
                    my $annotationID = "$peg:$keyStamp";
                    $seenTimestamps{$annotationID} = 1;
                    # Insure the user exists.
                    if (! $users{$user}) {
                        $loadSproutUser->Put($user, "SEED user");
                        $loadUserAccess->Put($user, 1);
                        $users{$user} = 1;
                    }
                    # Generate the annotation.
                    $loadAnnotation->Put($annotationID, $timestamp, $text);
                    $loadIsTargetOfAnnotation->Put($peg, $annotationID);
                    $loadMadeAnnotation->Put($user, $annotationID);
                } else {
                    # Here we have an invalid time stamp.
                    Trace("Invalid time stamp \"$timestamp\" in annotations for $peg.") if T(1);
                }
            }
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadSourceData

C<< my $stats = $spl->LoadSourceData(); >>

Load the source data from FIG into Sprout.

Source data links genomes to information about the organizations that
mapped it.

The following relations are loaded by this method.

    ComesFrom
    Source
    SourceURL

There is no direct support for source attribution in FIG, so we access the SEED
files directly.

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadSourceData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeHash = $self->{genomes};
    # Create load objects for each of the tables we're loading.
    my $loadComesFrom = $self->_TableLoader('ComesFrom', $self->PrimaryOnly);
    my $loadSource = $self->_TableLoader('Source');
    my $loadSourceURL = $self->_TableLoader('SourceURL');
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating annotation data.") if T(2);
        # Create hashes to collect the Source information.
        my %sourceURL = ();
        my %sourceDesc = ();
        # Loop through the genomes.
        my $line;
        for my $genomeID (sort keys %{$genomeHash}) {
            Trace("Processing $genomeID.") if T(3);
            # Open the project file.
            if ((open(TMP, "<$FIG_Config::organisms/$genomeID/PROJECT")) &&
                defined($line = <TMP>)) {
                chomp $line;
                my($sourceID, $desc, $url) = split(/\t/,$line);
                $loadComesFrom->Put($genomeID, $sourceID);
                if ($url && ! exists $sourceURL{$sourceID}) {
                    $loadSourceURL->Put($sourceID, $url);
                    $sourceURL{$sourceID} = 1;
                }
                if ($desc) {
                    $sourceDesc{$sourceID} = $desc;
                } elsif (! exists $sourceDesc{$sourceID}) {
                    $sourceDesc{$sourceID} = $sourceID;
                }
            }
            close TMP;
        }
        # Write the source descriptions.
        for my $sourceID (keys %sourceDesc) {
            $loadSource->Put($sourceID, $sourceDesc{$sourceID});
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadExternalData

C<< my $stats = $spl->LoadExternalData(); >>

Load the external data from FIG into Sprout.

External data contains information about external feature IDs.

The following relations are loaded by this method.

    ExternalAliasFunc
    ExternalAliasOrg

The support for external IDs in FIG is hidden beneath layers of other data, so
we access the SEED files directly to create these tables. This is also one of
the few load methods that does not proceed genome by genome.

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadExternalData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeHash = $self->{genomes};
    # Convert the genome hash. We'll get the genus and species for each genome and make
    # it the key.
    my %speciesHash = map { $fig->genus_species($_) => $_ } (keys %{$genomeHash});
    # Create load objects for each of the tables we're loading.
    my $loadExternalAliasFunc = $self->_TableLoader('ExternalAliasFunc');
    my $loadExternalAliasOrg = $self->_TableLoader('ExternalAliasOrg');
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating external data.") if T(2);
        # We loop through the files one at a time. First, the organism file.
        Open(\*ORGS, "sort +0 -1 -u -t\"\t\" $FIG_Config::global/ext_org.table |");
        my $orgLine;
        while (defined($orgLine = <ORGS>)) {
            # Clean the input line.
            chomp $orgLine;
            # Parse the organism name.
            my ($protID, $name) = split /\s*\t\s*/, $orgLine;
            $loadExternalAliasOrg->Put($protID, $name);
        }
        close ORGS;
        # Now the function file.
        my $funcLine;
        Open(\*FUNCS, "sort +0 -1 -u -t\"\t\" $FIG_Config::global/ext_func.table |");
        while (defined($funcLine = <FUNCS>)) {
            # Clean the line ending.
            chomp $funcLine;
            # Only proceed if the line is non-blank.
            if ($funcLine) {
                # Split it into fields.
                my @funcFields = split /\s*\t\s*/, $funcLine;
                # If there's an EC number, append it to the description.
                if ($#funcFields >= 2 && $funcFields[2] =~ /^(EC .*\S)/) {
                    $funcFields[1] .= " $1";
                }
                # Output the function line.
                $loadExternalAliasFunc->Put(@funcFields[0,1]);
            }
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}


=head3 LoadReactionData

C<< my $stats = $spl->LoadReactionData(); >>

Load the reaction data from FIG into Sprout.

Reaction data connects reactions to the compounds that participate in them.

The following relations are loaded by this method.

    Reaction
    ReactionURL
    Compound
    CompoundName
    CompoundCAS
    IsAComponentOf

This method proceeds reaction by reaction rather than genome by genome.

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadReactionData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Create load objects for each of the tables we're loading.
    my $loadReaction = $self->_TableLoader('Reaction');
    my $loadReactionURL = $self->_TableLoader('ReactionURL', $self->PrimaryOnly);
    my $loadCompound = $self->_TableLoader('Compound', $self->PrimaryOnly);
    my $loadCompoundName = $self->_TableLoader('CompoundName', $self->PrimaryOnly);
    my $loadCompoundCAS = $self->_TableLoader('CompoundCAS', $self->PrimaryOnly);
    my $loadIsAComponentOf = $self->_TableLoader('IsAComponentOf', $self->PrimaryOnly);
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating annotation data.") if T(2);
        # First we create the compounds.
        my @compounds = $fig->all_compounds();
        for my $cid (@compounds) {
            # Check for names.
            my @names = $fig->names_of_compound($cid);
            # Each name will be given a priority number, starting with 1.
            my $prio = 1;
            for my $name (@names) {
                $loadCompoundName->Put($cid, $name, $prio++);
            }
            # Create the main compound record. Note that the first name
            # becomes the label.
            my $label = (@names > 0 ? $names[0] : $cid);
            $loadCompound->Put($cid, $label);
            # Check for a CAS ID.
            my $cas = $fig->cas($cid);
            if ($cas) {
                $loadCompoundCAS->Put($cid, $cas);
            }
        }
        # All the compounds are set up, so we need to loop through the reactions next. First,
        # we initialize the discriminator index. This is a single integer used to insure
        # duplicate elements in a reaction are not accidentally collapsed.
        my $discrim = 0;
        my @reactions = $fig->all_reactions();
        for my $reactionID (@reactions) {
            # Create the reaction record.
            $loadReaction->Put($reactionID, $fig->reversible($reactionID));
            # Compute the reaction's URL.
            my $url = HTML::reaction_link($reactionID);
            # Put it in the ReactionURL table.
            $loadReactionURL->Put($reactionID, $url);
            # Now we need all of the reaction's compounds. We get these in two phases,
            # substrates first and then products.
            for my $product (0, 1) {
                # Get the compounds of the current type for the current reaction. FIG will
                # give us 3-tuples: [ID, stoichiometry, main-flag]. At this time we do not
                # have location data in SEED, so it defaults to the empty string.
                my @compounds = $fig->reaction2comp($reactionID, $product);
                for my $compData (@compounds) {
                    # Extract the compound data from the current tuple.
                    my ($cid, $stoich, $main) = @{$compData};
                    # Link the compound to the reaction.
                    $loadIsAComponentOf->Put($cid, $reactionID, $discrim++, "", $main,
                                             $product, $stoich);
                }
            }
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadGroupData

C<< my $stats = $spl->LoadGroupData(); >>

Load the genome Groups into Sprout.

The following relations are loaded by this method.

    GenomeGroups

Currently, we do not use groups. We used to use them for NMPDR groups,
butThere is no direct support for genome groups in FIG, so we access the SEED
files directly.

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadGroupData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeHash = $self->{genomes};
    # Create a load object for the table we're loading.
    my $loadGenomeGroups = $self->_TableLoader('GenomeGroups');
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating group data.") if T(2);
        # Currently there are no groups.
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadSynonymData

C<< my $stats = $spl->LoadSynonymData(); >>

Load the synonym groups into Sprout.

The following relations are loaded by this method.

    SynonymGroup
    IsSynonymGroupFor

The source information for these relations is taken from the C<maps_to_id> method
of the B<FIG> object. Unfortunately, to make this work, we need to use direct
SQL against the FIG database.

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadSynonymData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeHash = $self->{genomes};
    # Create a load object for the table we're loading.
    my $loadSynonymGroup = $self->_TableLoader('SynonymGroup');
    my $loadIsSynonymGroupFor = $self->_TableLoader('IsSynonymGroupFor');
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating synonym group data.") if T(2);
        # Get the database handle.
        my $dbh = $fig->db_handle();
        # Ask for the synonyms.
        my $sth = $dbh->prepare_command("SELECT maps_to, syn_id FROM peg_synonyms ORDER BY maps_to");
        my $result = $sth->execute();
        if (! defined($result)) {
            Confess("Database error in Synonym load: " . $sth->errstr());
        } else {
            # Remember the current synonym.
            my $current_syn = "";
            # Count the features.
            my $featureCount = 0;
            # Loop through the synonym/peg pairs.
            while (my @row = $sth->fetchrow()) {
                # Get the synonym ID and feature ID.
                my ($syn_id, $peg) = @row;
                # Insure it's for one of our genomes.
                my $genomeID = FIG::genome_of($peg);
                if (exists $genomeHash->{$genomeID}) {
                    # Verify the synonym.
                    if ($syn_id ne $current_syn) {
                        # It's new, so put it in the group table.
                        $loadSynonymGroup->Put($syn_id);
                        $current_syn = $syn_id;
                    }
                    # Connect the synonym to the peg.
                    $loadIsSynonymGroupFor->Put($syn_id, $peg);
                    # Count this feature.
                    $featureCount++;
                    if ($featureCount % 1000 == 0) {
                        Trace("$featureCount features processed.") if T(3);
                    }
                }
            }
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadFamilyData

C<< my $stats = $spl->LoadFamilyData(); >>

Load the protein families into Sprout.

The following relations are loaded by this method.

    Family
    IsFamilyForFeature

The source information for these relations is taken from the C<families_for_protein>,
C<family_function>, and C<sz_family> methods of the B<FIG> object.

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadFamilyData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeHash = $self->{genomes};
    # Create load objects for the tables we're loading.
    my $loadFamily = $self->_TableLoader('Family');
    my $loadIsFamilyForFeature = $self->_TableLoader('IsFamilyForFeature');
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating family data.") if T(2);
        # Create a hash for the family IDs.
        my %familyHash = ();
        # Loop through the genomes.
        for my $genomeID (sort keys %{$genomeHash}) {
            Trace("Processing features for $genomeID.") if T(2);
            # Loop through this genome's PEGs.
            for my $fid ($fig->all_features($genomeID, "peg")) {
                $loadIsFamilyForFeature->Add("features", 1);
                # Get this feature's families.
                my @families = $fig->families_for_protein($fid);
                # Loop through the families, connecting them to the feature.
                for my $family (@families) {
                    $loadIsFamilyForFeature->Put($family, $fid);
                    # If this is a new family, create a record for it.
                    if (! exists $familyHash{$family}) {
                        $familyHash{$family} = 1;
                        $loadFamily->Add("families", 1);
                        my $size = $fig->sz_family($family);
                        my $func = $fig->family_function($family);
                        $loadFamily->Put($family, $size, $func);
                    }
                }
            }
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}

=head3 LoadDrugData

C<< my $stats = $spl->LoadDrugData(); >>

Load the drug target data into Sprout.

The following relations are loaded by this method.

    DrugProject
    ContainsTopic
    DrugTopic
    ContainsAnalysisOf
    PDB
    IncludesBound
    IsBoundIn
    BindsWith
    Ligand
    DescribesProteinForFeature
    FeatureConservation

The source information for these relations is taken from flat files in the
C<$FIG_Config::drug_directory>. The file C<master_tables.list> contains
a list of drug project names paired with file names. The named file (in the
same directory) contains all the data for the project.

=over 4

=item RETURNS

Returns a statistics object for the loads.

=back

=cut
#: Return Type $%;
sub LoadDrugData {
    # Get this object instance.
    my ($self) = @_;
    # Get the FIG object.
    my $fig = $self->{fig};
    # Get the genome hash.
    my $genomeHash = $self->{genomes};
    # Create load objects for the tables we're loading.
    my $loadDrugProject = $self->_TableLoader('DrugProject');
    my $loadContainsTopic = $self->_TableLoader('ContainsTopic');
    my $loadDrugTopic = $self->_TableLoader('DrugTopic');
    my $loadContainsAnalysisOf = $self->_TableLoader('ContainsAnalysisOf');
    my $loadPDB = $self->_TableLoader('PDB');
    my $loadIncludesBound = $self->_TableLoader('IncludesBound');
    my $loadIsBoundIn = $self->_TableLoader('IsBoundIn');
    my $loadBindsWith = $self->_TableLoader('BindsWith');
    my $loadLigand = $self->_TableLoader('Ligand');
    my $loadDescribesProteinForFeature = $self->_TableLoader('DescribesProteinForFeature');
    my $loadFeatureConservation = $self->_TableLoader('FeatureConservation');
    if ($self->{options}->{loadOnly}) {
        Trace("Loading from existing files.") if T(2);
    } else {
        Trace("Generating drug target data.") if T(2);
        # Load the project list. The file comes in as a list of chomped lines,
        # and we split them on the TAB character to make the project name the
        # key and the file name the value of the resulting hash.
        my %projects = map { split /\t/, $_ } Tracer::GetFile("$FIG_Config::drug_directory/master_tables.list");
        # Create hashes for the derived objects: PDBs, Features, and Ligands. These objects
        # may occur multiple times in a single project file or even in multiple project
        # files.
        my %ligands = ();
        my %pdbs = ();
        my %features = ();
        my %bindings = ();
        # Set up a counter for drug topics. This will be used as the key.
        my $topicCounter = 0;
        # Loop through the projects. We sort the keys not because we need them sorted, but
        # because it makes it easier to infer our progress from trace messages.
        for my $project (sort keys %projects) {
            Trace("Processing project $project.") if T(3);
            # Only proceed if the download file exists.
            my $projectFile = "$FIG_Config::drug_directory/$projects{$project}";
            if (! -f $projectFile) {
                Trace("Project file $projectFile not found.") if T(0);
            } else {
                # Create the project record.
                $loadDrugProject->Put($project);
                # Create a hash for the topics. Each project has one or more topics. The
                # topic is identified by a URL, a category, and an identifier.
                my %topics = ();
                # Now we can open the project file.
                Trace("Reading project file $projectFile.") if T(3);
                Open(\*PROJECT, "<$projectFile");
                # Get the first record, which is a list of column headers. We don't use this
                # for anything, but it may be useful for debugging.
                my $headerLine = <PROJECT>;
                # Loop through the rest of the records.
                while (! eof PROJECT) {
                    # Get the current line of data. Note that not all lines will have all
                    # the fields. In particular, the CLIBE data is fairly rare.
                    my ($authorOrganism, $category, $tag, $refURL, $peg, $conservation,
                        $pdbBound, $pdbBoundEval, $pdbFree, $pdbFreeEval, $pdbFreeTitle,
                        $protDistInfo, $passAspInfo, $passAspFile, $passWeightInfo,
                        $passWeightFile, $clibeInfo, $clibeURL, $clibeTotalEnergy,
                        $clibeVanderwaals, $clibeHBonds, $clibeEI, $clibeSolvationE)
                       = Tracer::GetLine(\*PROJECT);
                    # The tag contains an identifier for the current line of data followed
                    # by a text statement that generally matches a property name in the
                    # main database. We split it up, since the identifier goes with
                    # the PDB data and the text statement is part of the topic.
                    my ($lineID, $topicTag) = split /\s*,\s*/, $tag;
                    $loadDrugProject->Add("data line");
                    # Check for a new topic.
                    my $topicData = "$category\t$topicTag\t$refURL";
                    if (! exists $topics{$topicData}) {
                        # Here we have a new topic. Compute its ID.
                        $topicCounter++;
                        $topics{$topicData} = $topicCounter;
                        # Create its database record.
                        $loadDrugTopic->Put($topicCounter, $refURL, $category, $authorOrganism,
                                            $topicTag);
                        # Connect it to the project.
                        $loadContainsTopic->Put($project, $topicCounter);
                        $loadDrugTopic->Add("topic");
                    }
                    # Now we know the topic ID exists in the hash and the topic will
                    # appear in the database, so we get this topic's ID.
                    my $topicID = $topics{$topicData};
                    # If the feature in this line is new, we need to save its conservation
                    # number.
                    if (! exists $features{$peg}) {
                        $loadFeatureConservation->Put($peg, $conservation);
                        $features{$peg} = 1;
                    }
                    # Now we have two PDBs to deal with-- a bound PDB and a free PDB.
                    # The free PDB will have data about docking points; the bound PDB
                    # will have data about docking. We store both types as PDBs, and
                    # the special data comes from relationships. First we process the
                    # bound PDB.
                    if ($pdbBound) {
                        $loadPDB->Add("bound line");
                        # Insure this PDB is in the database.
                        $self->CreatePDB($pdbBound, lc "$pdbFreeTitle (bound)", "bound", \%pdbs, $loadPDB);
                        # Connect it to this topic.
                        $loadIncludesBound->Put($topicID, $pdbBound);
                        # Check for CLIBE data.
                        if ($clibeInfo) {
                            $loadLigand->Add("clibes");
                            # We have CLIBE data, so we create a ligand and relate it to the PDB.
                            if (! exists $ligands{$clibeInfo}) {
                                # This is a new ligand, so create its record.
                                $loadLigand->Put($clibeInfo);
                                $loadLigand->Add("ligand");
                                # Make sure we know this ligand already exists.
                                $ligands{$clibeInfo} = 1;
                            }
                            # Now connect the PDB to the ligand using the CLIBE data.
                            $loadBindsWith->Put($pdbBound, $clibeInfo, $clibeURL, $clibeHBonds, $clibeEI,
                                                $clibeSolvationE, $clibeVanderwaals);
                        }
                        # Connect this PDB to the feature.
                        $loadDescribesProteinForFeature->Put($pdbBound, $peg, $protDistInfo, $pdbBoundEval);
                    }
                    # Next is the free PDB.
                    if ($pdbFree) {
                        $loadPDB->Add("free line");
                        # Insure this PDB is in the database.
                        $self->CreatePDB($pdbFree, lc $pdbFreeTitle, "free", \%pdbs, $loadPDB);
                        # Connect it to this topic.
                        $loadContainsAnalysisOf->Put($topicID, $pdbFree, $passAspInfo,
                                                     $passWeightFile, $passWeightInfo, $passAspFile);
                        # Connect this PDB to the feature.
                        $loadDescribesProteinForFeature->Put($pdbFree, $peg, $protDistInfo, $pdbFreeEval);
                    }
                    # If we have both PDBs, we may need to link them.
                    if ($pdbFree && $pdbBound) {
                        $loadIsBoundIn->Add("connection");
                        # Insure we only link them once.
                        my $bindingKey =  "$pdbFree\t$pdbBound";
                        if (! exists $bindings{$bindingKey}) {
                            $loadIsBoundIn->Add("newConnection");
                            $loadIsBoundIn->Put($pdbFree, $pdbBound);
                            $bindings{$bindingKey} = 1;
                        }
                    }
                }
                # Close off this project.
                close PROJECT;
            }
        }
    }
    # Finish the load.
    my $retVal = $self->_FinishAll();
    return $retVal;
}


=head2 Internal Utility Methods

=head3 SpecialAttribute

C<< my $count = SproutLoad::SpecialAttribute($id, \@attributes, $idxMatch, \@idxValues, $pattern, $loader); >>

Look for special attributes of a given type. A special attribute is found by comparing one of
the columns of the incoming attribute list to a search pattern. If a match is found, then
a set of columns is put into an output table connected to the specified ID.

For example, when processing features, the attribute list we look at has three columns: attribute
name, attribute value, and attribute value HTML. The IEDB attribute exists if the attribute name
begins with C<iedb_>. The call signature is therefore

    my $found = SpecialAttribute($fid, \@attributeList, 0, [0,2], '^iedb_', $loadFeatureIEDB);

The pattern is matched against column 0, and if we have a match, then column 2's value is put
to the output along with the specified feature ID.

=over 4

=item id

ID of the object whose special attributes are being loaded. This forms the first column of the
output.

=item attributes

Reference to a list of tuples.

=item idxMatch

Index in each tuple of the column to be matched against the pattern. If the match is
successful, an output record will be generated.

=item idxValues

Reference to a list containing the indexes in each tuple of the columns to be put as
the second column of the output.

=item pattern

Pattern to be matched against the specified column. The match will be case-insensitive.

=item loader

An object to which each output record will be put. Usually this is an B<ERDBLoad> object,
but technically it could be anything with a C<Put> method.

=item RETURN

Returns a count of the matches found.

=item 

=back

=cut

sub SpecialAttribute {
    # Get the parameters.
    my ($id, $attributes, $idxMatch, $idxValues, $pattern, $loader) = @_;
    # Declare the return variable.
    my $retVal = 0;
    # Loop through the attribute rows.
    for my $row (@{$attributes}) {
        # Check for a match.
        if ($row->[$idxMatch] =~ m/$pattern/i) {
            # We have a match, so output a row. This is a bit tricky, since we may
            # be putting out multiple columns of data from the input.
            my $value = join(" ", map { $row->[$_] } @{$idxValues});
            $loader->Put($id, $value);
            $retVal++;
        }
    }
    Trace("$retVal special attributes found for $id and loader " . $loader->RelName() . ".") if T(4) && $retVal;
    # Return the number of matches.
    return $retVal;
}

=head3 CreatePDB

C<< $loader->CreatePDB($pdbID, $title, $type, \%pdbHash); >>

Insure that a PDB record exists for the identified PDB. If one does not exist, it will be
created.

=over 4

=item pdbID

ID string (usually an unqualified file name) for the desired PDB.

=item title

Title to use if the PDB must be created.

=item type

Type of PDB: C<free> or C<bound>

=item pdbHash

Hash containing the IDs of PDBs that have already been created.

=item pdbLoader

Load object for the PDB table.

=back

=cut

sub CreatePDB {
    # Get the parameters.
    my ($self, $pdbID, $title, $type, $pdbHash, $pdbLoader) = @_;
    $pdbLoader->Add("PDB check");
    # Check to see if this is a new PDB.
    if (! exists $pdbHash->{$pdbID}) {
        # It is, so we create it.
        $pdbLoader->Put($pdbID, $title, $type);
        $pdbHash->{$pdbID} = 1;
        # Count it.
        $pdbLoader->Add("PDB-$type");
    }
}

=head3 TableLoader

Create an ERDBLoad object for the specified table. The object is also added to
the internal list in the C<loaders> property of this object. That enables the
L</FinishAll> method to terminate all the active loads.

This is an instance method.

=over 4

=item tableName

Name of the table (relation) being loaded.

=item ignore

TRUE if the table should be ignored entirely, else FALSE.

=item RETURN

Returns an ERDBLoad object for loading the specified table.

=back

=cut

sub _TableLoader {
    # Get the parameters.
    my ($self, $tableName, $ignore) = @_;
    # Create the load object.
    my $retVal = ERDBLoad->new($self->{erdb}, $tableName, $self->{loadDirectory}, $self->LoadOnly,
                               $ignore);
    # Cache it in the loader list.
    push @{$self->{loaders}}, $retVal;
    # Return it to the caller.
    return $retVal;
}

=head3 FinishAll

Finish all the active loads on this object.

When a load is started by L</TableLoader>, the controlling B<ERDBLoad> object is cached in
the list pointed to be the C<loaders> property of this object. This method pops the loaders
off the list and finishes them to flush out any accumulated residue.

This is an instance method.

=over 4

=item RETURN

Returns a statistics object containing the accumulated statistics for the load.

=back

=cut

sub _FinishAll {
    # Get this object instance.
    my ($self) = @_;
    # Create the statistics object.
    my $retVal = Stats->new();
    # Get the loader list.
    my $loadList = $self->{loaders};
    # Create a hash to hold the statistics objects, keyed on relation name.
    my %loaderHash = ();
    # Loop through the list, finishing the loads. Note that if the finish fails, we die
    # ignominiously. At some future point, we want to make the loads more restartable.
    while (my $loader = pop @{$loadList}) {
        # Get the relation name.
        my $relName = $loader->RelName;
        # Check the ignore flag.
        if ($loader->Ignore) {
            Trace("Relation $relName not loaded.") if T(2);
        } else {
            # Here we really need to finish.
            Trace("Finishing $relName.") if T(2);
            my $stats = $loader->Finish();
            $loaderHash{$relName} = $stats;
        }
    }
    # Now we loop through again, actually loading the tables. We want to finish before
    # loading so that if something goes wrong at this point, all the load files are usable
    # and we don't have to redo all that work.
    for my $relName (sort keys %loaderHash) {
        # Get the statistics for this relation.
        my $stats = $loaderHash{$relName};
        # Check for a database load.
        if ($self->{options}->{dbLoad}) {
            # Here we want to use the load file just created to load the database.
            Trace("Loading relation $relName.") if T(2);
            my $newStats = $self->{sprout}->LoadUpdate(1, [$relName]);
            # Accumulate the statistics from the DB load.
            $stats->Accumulate($newStats);
        }
        $retVal->Accumulate($stats);
        Trace("Statistics for $relName:\n" . $stats->Show()) if T(2);
    }
    # Return the load statistics.
    return $retVal;
}
=head3 GetGenomeAttributes

C<< my $aHashRef = GetGenomeAttributes($fig, $genomeID, \@fids); >>

Return a hash of attributes keyed on feature ID. This method gets all the attributes
for all the features of a genome in a single call, then organizes them into a hash.

=over 4

=item fig

FIG-like object for accessing attributes.

=item genomeID

ID of the genome who's attributes are desired.

=item fids

Reference to a list of the feature IDs whose attributes are to be kept.

=item RETURN

Returns a reference to a hash. The key of the hash is the feature ID. The value is the
reference to a list of the feature's attribute tuples. Each tuple contains the feature ID,
the attribute key, and one or more attribute values.

=back

=cut

sub GetGenomeAttributes {
    # Get the parameters.
    my ($fig, $genomeID, $fids) = @_;
    # Declare the return variable.
    my $retVal = {};
    # Get the attributes.
    my @aList = $fig->get_attributes("fig|$genomeID%");
    # Initialize the hash. This not only enables us to easily determine which FIDs to
    # keep, it insures that the caller sees a list reference for every known fid,
    # simplifying the logic.
    for my $fid (@{$fids}) {
        $retVal->{$fid} = [];
    }
    # Populate the hash.
    for my $aListEntry (@aList) {
        my $fid = $aListEntry->[0];
        if (exists $retVal->{$fid}) {
            push @{$retVal->{$fid}}, $aListEntry;
        }
    }
    # Return the result.
    return $retVal;
}

1;

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3