[Bio] / FigKernelScripts / svr_in_runs.pl Repository:
ViewVC logotype

View of /FigKernelScripts/svr_in_runs.pl

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.5 - (download) (as text) (annotate)
Wed Feb 17 15:03:27 2010 UTC (9 years, 9 months ago) by parrello
Branch: MAIN
CVS Tags: mgrast_dev_08112011, mgrast_dev_08022011, rast_rel_2014_0912, myrast_rel40, mgrast_dev_05262011, mgrast_dev_04082011, rast_rel_2010_0928, mgrast_version_3_2, mgrast_dev_12152011, mgrast_dev_06072011, rast_rel_2010_0526, rast_rel_2014_0729, mgrast_dev_02212011, rast_rel_2010_1206, mgrast_release_3_0, mgrast_dev_03252011, rast_rel_2011_0119, mgrast_release_3_0_4, mgrast_release_3_0_2, mgrast_release_3_0_3, mgrast_release_3_0_1, mgrast_dev_03312011, mgrast_release_3_1_2, mgrast_release_3_1_1, mgrast_release_3_1_0, mgrast_dev_04132011, mgrast_dev_04012011, rast_rel_2010_0827, myrast_33, rast_rel_2011_0928, mgrast_dev_04052011, mgrast_dev_02222011, mgrast_dev_10262011, HEAD
Changes since 1.4: +2 -0 lines
Fixed POD errors. Added examples.

#!/usr/bin/perl -w 
use strict;
use SAPserver;
use Getopt::Long;

#   This is a SAS component.

=head1 svr_in_runs

    svr_in_runs <groups.tbl >operons.tbl

Make sequences of genes into operons.

This script takes as input groups of genes and finds all the operons containing
genes in each group. For the purposes of this script, an operon is a sequence of
genes that are close together on the same contig and pointing in the same direction.
The operons may contain other genes in the vicinity of the ones specified in the
original input.

The input must be a tab-delimited file. Each group should be the last field on
a line, and must be specified as a comma-separated list of FIG IDs. The operons
will also be rendered as a comma-separated list of FIG IDs. The output will consist
of the input lines with operons tacked onto the end. Since a group may
contain multiple operons, a single input line may produce multiple output lines.

This is a pipe command: the input is taken from the standard input and the output
is to the standard output.

=head2 Command-Line Options

=over 4

=item MaxGap

Maximum number of base pairs that can be between to genes in order for them to
be considered as part of the same operon. The default is 200.

=item JustFirst

If specified, then only the first gene in an operon will be included in the output.

=item url

The URL for the Sapling server, if it is to be different from the default.

=back

=cut

    my $MaxGap = 0;
    my $JustFirst = 0;
    
    $0 =~ m/([^\/]+)$/;
    my $self = $1;
    my $usage = "$self [--MaxGap=N --JustFirst --url=http://... ] <input >output";
    
    my $rc = GetOptions("MaxGap=i" => \$MaxGap, "JustFirst" => \$JustFirst);
    
    if (!$rc) {
        die "\n   usage: $usage\n\n";
    }
    
    my $ss = SAPserver->new();
    
    my %args;
    
    if ($JustFirst) {
        $args{-justFirst} = $JustFirst;
    }
    if ($MaxGap) {
        $args{-maxGap} = $MaxGap;
    }
    
    my $line;
    while (defined($line = <STDIN>)) {
        # Remove the new-line.
        chomp $line;
        # Get the fields in the line.
        my @fields = split /\t/, $line;
        # The last field is the group.
        $args{-groups} = [$fields[$#fields]];
        # Make the runs for this group.
        my $res =  $ss->make_runs(%args);
        # Output the result.
        foreach my $run (@{$res->{0}}) {
            print join("\t", $line, $run) . "\n";
        }
    }



MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3