[Bio] / FigKernelScripts / get_families_final.pl Repository:
ViewVC logotype

Diff of /FigKernelScripts/get_families_final.pl

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.2, Fri Mar 7 16:04:44 2014 UTC revision 1.3, Fri Mar 21 18:11:18 2014 UTC
# Line 1  Line 1 
1    #
2    # This is a SAS Component
3    #
4    
5    
6    =head1 get_families_final
7    
8    Generate protein families (of isofunctional homologs) using kmer technology.
9    
10    ------
11    
12    Example:
13    
14        get_families_final -f Families/families -s Seqs.Fasta
15    
16    This simple program gathers the families from
17    
18        good families (those which were assigned funtions and usually have a unique PEG)
19        bad.fixed families (those assigned a function and then subjected to a splitting test)
20        missed  (those families of PEGs with no assigned functions and clustered by similarity)
21    
22    and builds families.all, the final set of families.
23    
24    =head2 Command-Line Options
25    
26    =over 4
27    
28    =item -f FamilyFilesPrefix
29    
30    The prefix used when writing files recording subfamilies (and the final
31    families.all)
32    
33    =item -s Seqs.Fasta
34    
35    The directory from which the translations of PEGs from each genome are
36    used.
37    
38    =back
39    
40    =head2 Output Format
41    
42    Output is written to STDOUT and constitutes the derived protein families (which
43    include singletons).  An 8-column, tab-separated table is written:
44    
45        FamilyID - an integer
46        Function - function assigned to family
47        SubFunction - the Function and an integer (SubFunction) together uniquely
48                      determine the FamilyID.  Another way to look at it is
49    
50                        a) each family is assigned a unique ID and a function
51                        b) multiple families can have the same function (consider
52                           "hypothetical protein")
53                        c) the Function+SubFunction uniquely determine the FamilyID
54        PEG
55        LengthProt - the length of the translated PEG
56        Mean       - the mean length of PEGs in the family
57        StdDev     - standard deviation of lengths for family
58        Z-sc       - the Z-score associated with the length of this PEG
59    
60    =cut
61    
62  use strict;  use strict;
63  use Data::Dumper;  use Data::Dumper;
64  use Getopt::Long;  use Getopt::Long;

Legend:
Removed from v.1.2  
changed lines
  Added in v.1.3

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3