[Bio] / FigKernelScripts / svr_cohesion_groups.pl Repository:
ViewVC logotype

View of /FigKernelScripts/svr_cohesion_groups.pl

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.4 - (download) (as text) (annotate)
Mon Apr 18 16:01:28 2011 UTC (8 years, 6 months ago) by fangfang
Branch: MAIN
CVS Tags: mgrast_dev_08112011, mgrast_dev_08022011, rast_rel_2014_0912, myrast_rel40, mgrast_dev_05262011, mgrast_version_3_2, mgrast_dev_12152011, mgrast_dev_06072011, rast_rel_2014_0729, mgrast_release_3_1_2, mgrast_release_3_1_1, mgrast_release_3_1_0, rast_rel_2011_0928, mgrast_dev_10262011, HEAD
Changes since 1.3: +4 -3 lines
update

#
# This is a SAS Component
#

#
# Copyright (c) 2003-2006 University of Chicago and Fellowship
# for Interpretations of Genomes. All Rights Reserved.
#
# This file is part of the SEED Toolkit.
#
# The SEED Toolkit is free software. You can redistribute
# it and/or modify it under the terms of the SEED Toolkit
# Public License.
#
# You should have received a copy of the SEED Toolkit Public License
# along with this program; if not write to the University of Chicago
# at info@ci.uchicago.edu or the Fellowship for Interpretation of
# Genomes at veronika@thefig.info or download a copy from
# http://www.theseed.org/LICENSE.TXT.
#

use strict;
use Data::Dumper;
use Carp;
use Getopt::Long;

=head1 svr_cohesion_groups

    svr_cohesion_groups [options] < tree.newick > cohesion_groups.table

This script classifies tips of a newick tree into cohesion groups
based on bootstrap values of tree branches.

=head1 Introduction

A cohesion group is a collection of protein sequences from various
organisms whose amino acid sequences assemble as a compact cluster on
a phylogenetic tree.

See Roy A. Jensen's cohesion group analysis (PubMed ID: 18322033)

=head2 Command-line options

=over 4

=item -c bootstrap_cutoff

Specifies the threshold of branch support value for collapsing subtrees. (D = 0.85)

=item -f max_CG_size_in_fraction

Max fraction of a cohesion group (D = 0.20).

=item -m max_CG_size

Max size of a cohesion group (D = number of tips in tree).

=item -o 

With the -o option, all orphan cohesion groups are labeled as 'Orp'.

=back 

=head2 Input

The input tree is a newick file read from STDIN.

=head2 Output

The output is a two-column table [ tip_id, cohesion_group_id ] written to STDOUT.

=cut

use AlignTree;
use ATserver;
use SeedUtils;

use ffxtree;
use gjoalignment;
use gjoseqlib;

my $usage = <<"End_of_Usage";

usage: svr_cohesion_groups [options] < tree.newick > cohesion_group.table

       -c cutoff    - collapse subtrees whose root branch has support
                      values greater than cutoff (D = 0.85)
       -f fraction  - max fraction of a cohesion group (D = 0.20)
       -m size      - max size of a cohesion group (D = number of tips in tree)
       -o           - label all orphan groups as 'Orp'

End_of_Usage

my ($help, $cutoff, $fract, $maxcg, $orphan, $single);

GetOptions("h|help"         => \$help,
           "c|cutoff=f"     => \$cutoff,
           "f|fract=f"      => \$fract,
           "m|maxcg=i"      => \$maxcg,
           "o|orphan"       => \$orphan,
           "s|single"       => \$single);

$help and die $usage;

$cutoff ||= 0.85;
$fract  ||= 0.50;

my $tree = ffxtree::read_tree();
my $opts = { 'cg_cutoff' => $cutoff, "max_fract" => $fract, 'max_cg_size' => $maxcg, 'show_orphan' => $orphan, 'single_collapse' => $single };
my $cg   = ffxtree::make_cohesion_groups($tree, $opts);

if ($cg) {
    print join("\t", $_, $cg->{$_}). "\n" for keys %$cg;
}


MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3