[Bio] / FigKernelScripts / p3-collate.pl Repository:
ViewVC logotype

View of /FigKernelScripts/p3-collate.pl

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (download) (as text) (annotate)
Tue Aug 21 20:35:55 2018 UTC (14 months, 3 weeks ago) by parrello
Branch: MAIN
CVS Tags: HEAD
Changes copied from SEEDtk project.

=head1 Extract the First N Rows for Each Value of a Column

    p3-collate.pl N [options]

This script will read through a tab-delimited file on the standard input and output the first N rows for each specific
value found in the key column. For example, if we have a set of genomes sorted by quality and we ask for a 3-row sample
based on the species column, it will extract the 3 best-quality genomes for each species.

=head2 Parameters

The positional parameter must be the number of rows to extract for each value.

The standard input can be overridden using the options in L<P3Utils/ih_options>.

Use the options in L<P3Utils/col_options> to select the key column.
=cut

use strict;
use P3DataAPI;
use P3Utils;


# Get the command-line options.
my $opt = P3Utils::script_opts('N', P3Utils::col_options(), P3Utils::ih_options(),
        );
# Open the input file.
my $ih = P3Utils::ih($opt);
# Read the incoming headers.
my ($outHeaders, $keyCol) = P3Utils::process_headers($ih, $opt);
# Form the full header set and write it out.
if (! $opt->nohead) {
    P3Utils::print_cols($outHeaders);
}
# Get the number of rows to sample. The default is 1.
my ($N) = @ARGV;
$N //= 1;
if ($N < 1) {
    die "Collation specifies no output.";
}
# This is the collation hash.
my %groups;
# Loop through the input.
while (! eof $ih) {
    my $couplets = P3Utils::get_couplets($ih, $keyCol, $opt);
    for my $couplet (@$couplets) {
        my ($key, $row) = @$couplet;
        if (! $groups{$key}) {
            $groups{$key} = [$row];
        } else {
            my $group = $groups{$key};
            if (scalar @$group < $N) {
                push @$group, $row;
            }
        }
    }
}
# Write the output.
for my $key (sort keys %groups) {
    my $group = $groups{$key};
    for my $row (@$group) {
        P3Utils::print_cols($row);
    }
}

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3