[Bio] / Sprout / SaplingLoader.pl Repository:
ViewVC logotype

View of /Sprout/SaplingLoader.pl

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.8 - (download) (as text) (annotate)
Wed May 8 20:28:44 2013 UTC (6 years, 3 months ago) by parrello
Branch: MAIN
CVS Tags: rast_rel_2014_0912, rast_rel_2014_0729, HEAD
Changes since 1.7: +4 -3 lines
Updates for new incremental load.

#!/usr/bin/perl -w

=head1 Sapling Incremental Load Script

This script reads a file of instructions for updating a Sapling database. The
input file should be in tab-delimited format. The first column contains the
load type, and the remaining columns the parameters of the command. The
input file name will be taken from the first positional parameter. If no
parameter is provided, the input will be taken from the standard input.

Blank input lines will be ignored.

A line beginning with a hashtag (C<#>) will be output as a comment.

The supported commands are as follows.

=over 4

=item Expression

Reload the expression data for a genome. The parameters are (1) the ID of the
genome whose expression data is to be loaded or replaced and (2) the name of a
directory containing the data.

=item Function

Update the functional assignments of one or more features. The parameter is
the name of a tab-delimited file containing the functional assignment changes.
Each record in the file must contain a feature ID, the ID of the user who
made the change, and the new functional assignment.

=item Genome

Reload the sequence and annotation data for a genome. The parameters are (1) the
ID of the genome whose data is to be loaded or replaced and (2) the name of a
SEED genome directory containing the data. If the directory name is missing
the genome will be deleted.

=item Subsystem

Reload the data for a subsystem. The parameters are (1) the name of the subsystem
to be loaded or replaced and (2) the name of a SEED subsystem directory containing
the data. If the directory name is missing the subsystem will be deleted.

=item Taxonomy

Reload the taxonomy data in the database. The parameters are (1) the name of a
directory containing the NCBI taxonomy files and (2) the name of a file containing
the OTU specifications.

=back

The currently-supported command-line options are as follows.

=over 4

=item user

Name suffix to be used for log files. If omitted, the PID is used.

=item trace

Numeric trace level. A higher trace level causes more messages to appear. The
default trace level is 2. Tracing will be directly to the standard output
as well as to a C<trace>I<User>C<.log> file in the FIG temporary directory,
where I<User> is the value of the B<user> option above.

=item sql

If specified, turns on tracing of SQL activity.

=item background

Save the standard and error output to files. The files will be created
in the FIG temporary directory and will be named C<err>I<User>C<.log> and
C<out>I<User>C<.log>, respectively, where I<User> is the value of the
B<user> option above.

=item h

Display this command's parameters and options.

=item host

Alternate database host, if the database is located somewhere other than the
default. This is necessary on some Sapling machines to insure we get a writable
copy of the database.

=back

=cut

use strict;
use Tracer;
use Sapling;
use SaplingDataLoader;
use Stats;

use SaplingExpressionLoader;
use SaplingFunctionLoader;
use SaplingGenomeLoader;
use SaplingSubsystemLoader;
use SaplingTaxonomyLoader;

# Hash of valid commands.
use constant COMMANDS => { Expression => 1, Function => 1, Genome => 1, Subsystem => 1, Taxonomy => 1 };

# Get the command-line options and parameters.
my ($options, @parameters) = StandardSetup([qw(SaplingDataLoader) ],
                                           { host => ["", "alternate database host machine"],
                                             dbName => ["", "alternate database name"] },
                                           "<inputFile>",
                                           @ARGV);
# Create the statistics object.
my $stats = Stats->new();
# Insure we catch errors.
eval {
    # Get the Sapling database.
    my $sap = Sapling->new(dbhost => $options->{host}, dbName => $options->{dbName});
    # Compute the input file. If no file name is specified, we use "-", which translates to
    # STDIN.
    my $inFileName = $parameters[0] || "-";
    if ($inFileName eq "-") {
        Trace("Commands will be taken from standard input.") if T(2);
    } else {
        Trace("Commands will be taken from $inFileName.") if T(2);
    }
    my $ih = Open(undef, "<$inFileName");
    # Insure the input counter only refers to our input file.
    local $.;
    # Loop through the commands.
    while (! eof $ih) {
        # Get the next command.
        my ($command, @parms) = Tracer::GetLine($ih);
        $stats->Add(inputLines => 1);
        # Check for blank lines or comments.
        if (! $command) {
            # Here we have a blank line.
            $stats->Add(blankLines => 1);
        } elsif ($command =~ /^#/) {
            # Here we have a comment. Tabs are converted to spaces when this happens.
            $stats->Add(commentLines => 1);
            Trace(join(" ", $command, @parms)) if T(2);
        } elsif (! COMMANDS->{$command}) {
            # Here we have an invalid command.
            Trace("Unknown command $command on line $.: skipped.") if T(0);
            $stats->Add(errors => 1);
        } else {
            # Finally, we have a real command we can process.
            $stats->Add("command$command" => 1);
            # Form the string for executing the command.
            my $commandString = "Sapling${command}Loader::Process(\$sap, \@parms)";
            Trace("Beginning $command load.") if T(3);
            # Execute the command.
            my $commandStats = eval($commandString);
            # Check for errors.
            if ($@) {
                # Here the command failed.
                $stats->Add(errors => 1);
                Trace("Error in $command on line $.: $@")
            } else {
                # Here the command worked. Fold in the statistics.
                $stats->Accumulate($commandStats);
            }
        }
    }
    Trace("Processing complete.") if T(2);
};
if ($@) {
    Trace("Script failed with error: $@") if T(0);
} else {
    Trace("Script complete.") if T(2);
}
Trace("Statistics for this run:\n" . $stats->Show()) if T(2);

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3