[Bio] / FigKernelScripts / svr_mapped_genomes.pl Repository:
ViewVC logotype

Annotation of /FigKernelScripts/svr_mapped_genomes.pl

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.4 - (view) (download) (as text)

1 : overbeek 1.1 use strict;
2 :     use Data::Dumper;
3 :     use Carp;
4 :    
5 :     #
6 :     # This is a SAS Component
7 :     #
8 :    
9 :    
10 :     =head1 svr_mapped_genomes
11 :    
12 :     Get maps between a reference genome and a set of genomes to which
13 :     you wish to compare the reference genome.
14 :    
15 :     ------
16 :    
17 :     Example:
18 :    
19 :     svr_mapped_genomes -g 83333.1 -d Maps < genomes.to.compare.against
20 :    
21 :     would construct a directory of mappings between genes 83333.1 and the genomes
22 :     read from standard input. The maps would come back as files in the directory
23 :     "Maps" (which would get created if necessary).
24 :    
25 :     ------
26 :    
27 :     The standard input should be a tab-separated table (i.e., each line
28 :     is a tab-separated set of fields). Normally, the last field in each
29 :     line would contain the genome for which functions are being requested.
30 :     If some other column contains the genomes, use
31 :    
32 :     -c N
33 :    
34 :     where N is the column (from 1) that contains the genome in each case.
35 :    
36 :     =head2 Command-Line Options
37 :    
38 :     =over 4
39 :    
40 :     =item -c Column
41 :    
42 :     This is used only if the column containing PEGs is not the last.
43 :    
44 :     =item -g genome
45 :    
46 :     This designates the reference genome
47 :    
48 :     =item -d directory
49 :    
50 :     This designates the directory into which maps are written. It will be created
51 :     if it does not already exist
52 :    
53 :     =back
54 :    
55 :     =head2 Output Format
56 :    
57 :     The output is written as "maps" in the designated directory. Each map
58 :     is a file of 18 fields, tab-separated:
59 :    
60 :     1 The ID of a PEG in genome 1.
61 :     2 The ID of a PEG in genome 2 that is our best estimate of a "corresponding gene".
62 :     3 Count of the number of pairs of matching genes were found in the context.
63 :     4 Pairs of corresponding genes from the contexts.
64 :     5 The function of the gene in genome 1.
65 :     6 The function of the gene in genome 2.
66 :     7 Comma-separated list of aliases for the gene in genome 1 (any protein with an identical sequence is considered an alias, whether or not it is actually the name of the same gene in the same genome).
67 :     8 Comma-separated list of aliases for the gene in genome 2 (any protein with an identical sequence is considered an alias, whether or not it is actually the name of the same gene in the same genome).
68 :     9 Bi-directional best hits will contain "<=>" in this column; otherwise, "->" will appear.
69 :     10 Percent identity over the region of the detected match.
70 :     11 The P-score for the detected match.
71 :     12 Beginning match coordinate in the protein encoded by the gene in genome 1.
72 :     13 Ending match coordinate in the protein encoded by the gene in genome 1.
73 :     14 Length of the protein encoded by the gene in genome 1.
74 :     15 Beginning match coordinate in the protein encoded by the gene in genome 2.
75 :     16 Ending match coordinate in the protein encoded by the gene in genome 2.
76 :     17 Length of the protein encoded by the gene in genome 2.
77 :     18 Bit score for the match. Divide by the length of the longer PEG to get what we often refer to as a "normalized bit score".
78 :    
79 :     =cut
80 :    
81 :     use SeedUtils;
82 :     use SAPserver;
83 :     my $sapObject = SAPserver->new();
84 :    
85 :     my $usage = "usage: svr_mapped_genomes -g genome -d directory [-c column]";
86 :    
87 :     my($column,$genome,$directory);
88 :     while ($ARGV[0] && ($ARGV[0] =~ /^-/))
89 :     {
90 :     $_ = shift @ARGV;
91 :     if ($_ =~ s/^-c//) { $column = ($_ || shift @ARGV) }
92 :     elsif ($_ =~ s/^-g//) { $genome = ($_ || shift @ARGV) }
93 :     elsif ($_ =~ s/^-d//) { $directory = ($_ || shift @ARGV) }
94 :     else { die "Bad Flag: $_" }
95 :     }
96 :    
97 :     $genome || die "You need to specify a reference genome";
98 :     &SeedUtils::verify_dir($directory);
99 : parrello 1.4 my @lines = map { chomp; [split(/\t/,$_)] } <STDIN>;
100 : overbeek 1.1 if (! $column) { $column = @{$lines[0]} }
101 :     my @against = map { $_->[$column-1] } @lines;
102 :     foreach my $other (@against)
103 :     {
104 : overbeek 1.2 next if (($other eq $genome) || (-s "$directory/$genome-$other"));
105 : overbeek 1.1 my $map = $sapObject->gene_correspondence_map( -genome1 => $genome,
106 :     -genome2 => $other,
107 :     -fullOutput => 1,
108 :     -passive => 0 );
109 :     if ($map)
110 :     {
111 :     open(MAP,">","$directory/$genome-$other") || die "could not open $directory/$genome-$other";
112 :     foreach $_ (@$map)
113 :     {
114 :     print MAP join("\t",@$_),"\n";
115 :     }
116 :     close(MAP);
117 :     }
118 :     }

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3