[Bio] / FigKernelScripts / svr_summarize_MG_output.pl Repository:
ViewVC logotype

View of /FigKernelScripts/svr_summarize_MG_output.pl

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.4 - (download) (as text) (annotate)
Wed Jun 9 20:43:16 2010 UTC (9 years, 5 months ago) by overbeek
Branch: MAIN
CVS Tags: mgrast_dev_08112011, mgrast_dev_08022011, rast_rel_2014_0912, myrast_rel40, mgrast_dev_05262011, mgrast_dev_04082011, rast_rel_2010_0928, mgrast_version_3_2, mgrast_dev_12152011, mgrast_dev_06072011, rast_rel_2014_0729, mgrast_dev_02212011, rast_rel_2010_1206, mgrast_release_3_0, mgrast_dev_03252011, rast_rel_2011_0119, mgrast_release_3_0_4, mgrast_release_3_0_2, mgrast_release_3_0_3, mgrast_release_3_0_1, mgrast_dev_03312011, mgrast_release_3_1_2, mgrast_release_3_1_1, mgrast_release_3_1_0, mgrast_dev_04132011, mgrast_dev_04012011, rast_rel_2010_0827, myrast_33, rast_rel_2011_0928, mgrast_dev_04052011, mgrast_dev_02222011, mgrast_dev_10262011, HEAD
Changes since 1.3: +2 -2 lines
extend the number of digits in the fraction of the total

#!/usr/bin/perl -w

#
# This is a SAS Component
#

#
# Copyright (c) 2003-2006 University of Chicago and Fellowship
# for Interpretations of Genomes. All Rights Reserved.
#
# This file is part of the SEED Toolkit.
#
# The SEED Toolkit is free software. You can redistribute
# it and/or modify it under the terms of the SEED Toolkit
# Public License.
#
# You should have received a copy of the SEED Toolkit Public License
# along with this program; if not write to the University of Chicago
# at info@ci.uchicago.edu or the Fellowship for Interpretation of
# Genomes at veronika@thefig.info or download a copy from
# http://www.theseed.org/LICENSE.TXT.
#

use strict;
use Data::Dumper;

=head1 svr_summarize_MG_output

=head2 Introduction

    svr_summarize_MG_output  < output.from.svr_assign_to_dna_using_figfams

This simple program produces two summaries: one of the functions identified
and one of the OTUs identified.  We represent OTUs with a representative
organism.  The function summary is sent to stdout, while the OTU summary 
is sent to stderr.    

=head3 Output Format

The function summary written to stdout is a 3-column table:

=over 4

=item * the number of hits against the function

=item * the fraction of the total hits this represents

=item * the function

=back

The OTU summary is also a 3-column table:

=over 4

=item * the number of hits against that appear to unquely identify an OTU

=item * the fraction of the total hits this represents

=item * a representative organism for the OTU

=back

=cut

my $totF = 0;
my $totO = 0;
my(%functions,%otus);
while (defined($_ = <STDIN>))
{
    chomp;
    my(undef,undef,undef,$function,$otu) = split(/\t/,$_);
    $totF++;
    $functions{$function}++;
    if ($otu)
    {
	$totO++;
	$otus{$otu}++;
    }
}

foreach my $func (sort { $functions{$b} <=> $functions{$a} } keys(%functions))
{
    print join("\t",($functions{$func},sprintf("%0.6f",$functions{$func}/$totF),$func)),"\n";
}

foreach my $otu (sort { $otus{$b} <=> $otus{$a} } keys(%otus))
{
    print STDERR join("\t",($otus{$otu},sprintf("%0.6f",$otus{$otu}/$totO),$otu)),"\n";
}


MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3