[Bio] / FigKernelScripts / svr_aliases_of.pl Repository:
ViewVC logotype

Annotation of /FigKernelScripts/svr_aliases_of.pl

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.7 - (view) (download) (as text)

1 : overbeek 1.1 use strict;
2 :     use Data::Dumper;
3 :     use Carp;
4 :    
5 : olson 1.2 #
6 :     # This is a SAS Component
7 :     #
8 :    
9 :    
10 : parrello 1.3 use SeedEnv;
11 :     my $sapObject = SAPserver->new();
12 : overbeek 1.1
13 :     =head1 svr_aliases_of
14 :    
15 : disz 1.6 Return all identifiers for genes in the database that are protein-sequence-equivalent to the specified identifiers. In this case, the identifiers are assumed to be in their natural form (without prefixes). For each identifier, the identified protein sequences will be found and then for each protein sequence, all identifiers for that protein sequence or for genes that produce that protein sequence will be returned.
16 :    
17 :     Alternatively, you can ask for identifiers that are precisely equivalent, that is, that identify the same location on the same genome.
18 : overbeek 1.1
19 :     ------
20 :     Example: svr_all_features 3702.1 peg | svr_aliases_of
21 :    
22 :     would produce a 2-column table. The first column would contain
23 :     PEG IDs for genes occurring in genome 3702.1, and the second
24 :     would contain the aliases (comma-seprated) of those genes.
25 :    
26 :     The aliases are IDs of genes that have precisely the same
27 :     protein sequence, but may or may not be from the same genome.
28 :     ------
29 :    
30 :     The standard input should be a tab-separated table (i.e., each line
31 :     is a tab-separated set of fields). Normally, the last field in each
32 :     line would contain the PEG for which aliases are being requested.
33 :     If some other column contains the PEGs, use
34 :    
35 :     -c N
36 :    
37 :     where N is the column (from 1) that contains the PEG in each case.
38 :    
39 :     This is a pipe command. The input is taken from the standard input, and the
40 :     output is to the standard output.
41 :    
42 :     =head2 Command-Line Options
43 :    
44 : parrello 1.4 =over 4
45 :    
46 : overbeek 1.1 =item -c Column
47 :    
48 :     This is used only if the column containing PEGs is not the last.
49 :    
50 : disz 1.5 =item -r regexp
51 :    
52 :     This is used to restrict the aliases being returned. Only aliases matching the regexp are returned.
53 :    
54 : disz 1.6 =item -precise
55 :    
56 :     Only identifiers that refer to the same location on the same genome will be returned. If this option is specified, identifiers that refer to proteins rather than features will return no result.
57 :    
58 : parrello 1.4 =back
59 :    
60 : overbeek 1.1 =head2 Output Format
61 :    
62 :     The standard output is a tab-delimited file. It consists of the input
63 :     file with an extra column added (a comma-separated list of aliases).
64 :    
65 :     =cut
66 :    
67 :    
68 : disz 1.6 my $usage = "usage: svr_aliases_of [-c column -r regexp -precise]";
69 : overbeek 1.1
70 :     my $column;
71 : disz 1.5 my $regexp;
72 : disz 1.6 my $precise = 0;
73 : overbeek 1.1 while ($ARGV[0] && ($ARGV[0] =~ /^-/))
74 :     {
75 :     $_ = shift @ARGV;
76 :     if ($_ =~ s/^-c//) { $column = ($_ || shift @ARGV) }
77 : disz 1.5 elsif ($_ =~ s/^-r//) { $regexp = ($_ || shift @ARGV) }
78 : disz 1.6 elsif ($_ =~ s/^-precise//) { $precise = 1; next}
79 : overbeek 1.1 else { die "Bad Flag: $_" }
80 :     }
81 : disz 1.5
82 : disz 1.6
83 : parrello 1.3 ScriptThing::AdjustStdin();
84 : parrello 1.7 # The main loop processes chunks of input, 1000 lines at a time.
85 :     while (my @tuples = ScriptThing::GetBatch(\*STDIN, undef, $column)) {
86 :     # Ask the server for results.
87 :     my $document = $sapObject->equiv_sequence_ids(-ids => [map { $_->[0] } @tuples],
88 :     -precise => $precise);
89 :     # Loop through the IDs, producing output.
90 :     for my $tuple (@tuples) {
91 :     my ($id, $line) = @$tuple;
92 :     # Get this feature's alias data.
93 :     my $results = $document->{$id};
94 :     # Did we get something?
95 :     if (! $results) {
96 :     # No. Write an error notification.
97 :     print STDERR "$line\n";
98 :     } else {
99 :     # Loop through the results for this ID.
100 :     for my $result (@$results) {
101 :     # Print the output line.
102 :     print "$line\t$result\n";
103 :     }
104 :     }
105 : overbeek 1.1 }
106 :     }

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3