[Bio] / Sprout / LoaderUtils.pm Repository:
ViewVC logotype

View of /Sprout/LoaderUtils.pm

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.2 - (download) (as text) (annotate)
Mon Feb 1 20:14:28 2010 UTC (9 years, 7 months ago) by parrello
Branch: MAIN
CVS Tags: rast_rel_2010_0928, rast_rel_2010_0526, rast_rel_2010_1206, rast_rel_2011_0119, rast_rel_2010_0827
Changes since 1.1: +51 -0 lines
Sapling model updates.

#!/usr/bin/perl -w

package LoaderUtils;

    use strict;
    use Tracer;
    use SeedUtils;

=head1 Common DB Load Utilities

=head2 Introduction

This package contains static methods used by both the Sprout and Sapling loaders.

=head2 Public Methods

=head3 ReadAliasFile

    my $aliasHash = LoaderUtils::ReadAliasFile($dir, $genomeID);

This method reads the content of the alias file for the specified genome,
and returns a hash. For each feature, the hash contains a list of its
aliases. Each alias is represented by a 3-tuple consisting of the actual
alias, the alias type (e.g. C<CMR>, C<NCBI>), and the confidence code--
C<A> for a curated alias, C<B> for a non-curated feature alias, and C<C>
for a protein alias. If the alias file is not found, an error will occur.

=over 4

=item dir

Name of the directory containing the alias files.

=item genomeID

ID of the genome whose alias file is to be read.

=item RETURN

Returns a reference to a hash of feature IDs to alias lists. For each feature,
the alias list will be a reference to a list of 3-tuples. Each 3-tuple will
contain an alias ID, an alias type, and a confidence level from C<A> (highest)
to C<C> (lowest). If the alias file is not found, it will return an undefined
value.

=back

=cut

sub ReadAliasFile {
    # Get the parameters.
    my ($dir, $genomeID) = @_;
    # Declare the return variable.
    my $retVal = {};
    # Find the alias file. The alias files are created by "AliasCrunch.pl".
    my $aliasFile = "$dir/alias.$genomeID.tbl";
    if (! -f $aliasFile) {
        undef $retVal;
    } else {
        # The file exists, so open it for input.
        my $aliasH = Open(undef, "<$aliasFile");
        # Loop through the file.
        while (! eof $aliasH) {
            # Get this alias record.
            my ($aliasFid, $aliasID, $aliasType, $aliasConf) = Tracer::GetLine($aliasH);
            # Put it in the return hash.
            push @{$retVal->{$aliasFid}}, [$aliasID, $aliasType, $aliasConf];
        }
        # Close the file: we're done with it.
        close $aliasH;
        # Do a memory trace. Alias files can be pretty big.
        MemTrace("Aliases adjusted.") if T(ERDBLoadGroup => 3);
    }
    # Return the result.
    return $retVal;
}

=head3 RolesForLoading

    my ($roles, $errors) = RolesForLoading($function);

Split a functional assignment into roles. If the functional assignment
seems suspicious, it will be flagged as invalid. A count will be returned
of the number of roles that are rejected because they are too long.

=over 4

=item function

Functional assignment to parse.

=item RETURN

Returns a two-element list. The first is either a reference to a list of
roles, or an undefined value (indicating a suspicious functional assignment).
The second is the number of roles that are rejected for being too long.

=back

=cut

sub RolesForLoading {
    # Get the parameters.
    my ($function) = @_;
    # Declare the return variables.
    my ($roles, $errors) = (undef, 0);
    # Only proceed if there are no suspicious elements in the functional assignment.
    if (! ($function =~ /\b(?:similarit|blast\b|fasta|identity)|%|E=/i)) {
        # Initialize the return list.
        $roles = [];
        # Split the function into roles.
        my @roles = roles_of_function($function);
        # Keep only the good roles.
        for my $role (@roles) {
            if (length($role) > 250) {
                $errors++;
            } else {
                push @$roles, $role;
            }
        }
    }
    # Return the results.
    return ($roles, $errors);
}



1;

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3