[Bio] / FigKernelScripts / svr_split_fams.pl Repository:
ViewVC logotype

View of /FigKernelScripts/svr_split_fams.pl

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (download) (as text) (annotate)
Sat Nov 20 20:25:27 2010 UTC (9 years, 3 months ago) by overbeek
Branch: MAIN
CVS Tags: mgrast_dev_08112011, mgrast_dev_08022011, rast_rel_2014_0912, myrast_rel40, mgrast_dev_05262011, mgrast_dev_04082011, mgrast_version_3_2, mgrast_dev_12152011, mgrast_dev_06072011, rast_rel_2014_0729, mgrast_dev_02212011, rast_rel_2010_1206, mgrast_release_3_0, mgrast_dev_03252011, rast_rel_2011_0119, mgrast_release_3_0_4, mgrast_release_3_0_2, mgrast_release_3_0_3, mgrast_release_3_0_1, mgrast_dev_03312011, mgrast_release_3_1_2, mgrast_release_3_1_1, mgrast_release_3_1_0, mgrast_dev_04132011, mgrast_dev_04012011, myrast_33, rast_rel_2011_0928, mgrast_dev_04052011, mgrast_dev_02222011, mgrast_dev_10262011, HEAD
minor changes

use strict;
use SeedEnv;
use Data::Dumper;

#
# This is a SAS Component
#

=head1 svr_split_fams < protein.families > without.multiple 2> with.cycles

Detect cases in which BBHs have led to cycles

=head2 Introduction

If you are constructing protein families and using comparisons that are BBHs, you often wish
to avoid cases in which a single family has more than one gene/protein from a single genome.
This little utility splits out the sets with multiple genes from a single genome, writes
the removed entries to STDERR, and you can peruse them manually.

=head2 Output

The output consists of sets without cycles going to STDOUT and sets with cycles going
to STDERR

=cut

my $last = <STDIN>;
while ($last && ($last =~ /^(\S+)/))
{
    my $curr = $1;
    my @set = ();
    while ($last && ($last =~ /^(\S+)\t(\S+)/) && ($1 eq $curr))
    {
	push(@set,[$2,$last]);
	$last = <STDIN>;
    }
    &process(\@set);
}

sub process {
    my($set) = @_;

    my %genomes;
    my($i,$g);
    for ($i=0; ($i < @$set) && ($g = &SeedUtils::genome_of($set->[$i]->[0])) && (! $genomes{$g}); $i++) 
    {
	$genomes{$g} = 1;
    }

    foreach $_ (map { $_->[1] } @$set)
    {
	if ($i == @$set)
	{
	    print $_;
	}
	else
	{
	    print STDERR $_;
	}
    }
}

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3