Parent Directory
|
Revision Log
Revision 1.1 - (view) (download) (as text)
1 : | parrello | 1.1 | #!/usr/bin/perl -w |
2 : | |||
3 : | =head1 Add / Delete / Change Features | ||
4 : | |||
5 : | This method will run through a set of transaction files, adding, deleting, and changing | ||
6 : | features in the FIG data store. The command takes three input parameters. The first is | ||
7 : | a command. The second specifies a directory full of transaction files. The third | ||
8 : | specifies a file that tells us which feature IDs are available for each organism. | ||
9 : | |||
10 : | C<TransactFeatures> I<[options]> I<command> I<transactionDirectory> I<idFile> | ||
11 : | |||
12 : | The supported commands are | ||
13 : | |||
14 : | =over 4 | ||
15 : | |||
16 : | =item count | ||
17 : | |||
18 : | Count the number of IDs needed to process the ADD and CHANGE transactions. This | ||
19 : | will produce an listing of the number of feature IDs needed for each | ||
20 : | organism and feature type. This command is mostly a sanity check: it provides | ||
21 : | useful statistics without changing anything. | ||
22 : | |||
23 : | =item register | ||
24 : | |||
25 : | Create an ID file by requesting IDs from the clearinghouse. This performs the | ||
26 : | same function as B<count>, but takes the additional step of creating an ID | ||
27 : | file that can be used to process the transactions. | ||
28 : | |||
29 : | =item process | ||
30 : | |||
31 : | Process the transactions and update the FIG data store. This will also create | ||
32 : | a copy of each transaction file in which the pseudo-IDs have been replaced by | ||
33 : | real IDs. | ||
34 : | |||
35 : | =back | ||
36 : | |||
37 : | =head2 The Transaction File | ||
38 : | |||
39 : | Each transaction file is a standard tab-delimited file, one transaction per line. The | ||
40 : | name of the file is C<tbl_diff_>I<org> where I<org> is an organism ID. All records in | ||
41 : | the transaction file refer to transactions against the organism encoded in the file | ||
42 : | name. | ||
43 : | |||
44 : | The file must specify IDs for new features, but the real IDs cannot be known until | ||
45 : | they are requested from the SEED clearing house. Therefore, each new ID is specified | ||
46 : | in a special format consisting of the feature type (C<peg>, C<rna>, and so forth) | ||
47 : | followed by a dot and the 0-based ordinal number of the new ID within that | ||
48 : | feature type. So, for example, if the transaction file consists of a delete, | ||
49 : | a change, and two adds, it might look like this | ||
50 : | |||
51 : | delete fig|83333.1.peg.2 | ||
52 : | change fig|83333.1.peg.6 peg.0 ... | ||
53 : | add peg.1 ... | ||
54 : | add rna.0 ... | ||
55 : | |||
56 : | Note that the old feature IDs do not participate in the numbering process, and the RNA | ||
57 : | numbering is independent of the PEG numbering. In the discussion below of transaction | ||
58 : | types, a field named I<newID> will always indicate one of these type/number pairs. | ||
59 : | So, the field setup for the B<chang> command is | ||
60 : | |||
61 : | change fid newID locations aliases translation | ||
62 : | |||
63 : | And the I<newID> corresponds to the C<peg.6> in the example above. | ||
64 : | |||
65 : | The first field of each record is the transaction type. The list of subsequent fields | ||
66 : | depends on this type. | ||
67 : | |||
68 : | =over 4 | ||
69 : | |||
70 : | =item DELETE fid | ||
71 : | |||
72 : | Deletes a feature. The feature is marked as deleted in the FIG database, which | ||
73 : | causes it to be skipped or ignored by most of the SEED software. The ID of the | ||
74 : | feature to be deleted is the second field (I<fid>). | ||
75 : | |||
76 : | =item ADD newID locations translation | ||
77 : | |||
78 : | Adds a new feature. The I<newID> indicates the feature type and its ordinal number. | ||
79 : | The location is a comma-separated list of location strings. The translation is the | ||
80 : | protein translation for the location. If the translation is omitted, then it will | ||
81 : | be generated from the location information in the normal way. | ||
82 : | |||
83 : | =item CHANGE fid newID locations aliases translation | ||
84 : | |||
85 : | Changes an existing feature. The current copy of the feature is marked as deleted, | ||
86 : | and a new feature is created with a new ID. All annotations and assignments are | ||
87 : | transferred from the deleted feature to the new one. The location is a | ||
88 : | comma-separated list of location strings. The aliases are specified as a comma-delimited | ||
89 : | list of alternate names for the feature. These replace any existing aliases for the | ||
90 : | old feature. If the alias list is omitted, no aliases will be assigned to the new | ||
91 : | feature. The translation is the protein translation for the location. If the | ||
92 : | translation is omitted, then it will be generated from the location information in the | ||
93 : | normal way. | ||
94 : | |||
95 : | =back | ||
96 : | |||
97 : | =head2 The ID File | ||
98 : | |||
99 : | The ID file is a tab-delimited file containing one record for each feature type | ||
100 : | of each organism that has a transaction file. Each record consists of three | ||
101 : | fields. | ||
102 : | |||
103 : | =over 4 | ||
104 : | |||
105 : | =item orgID | ||
106 : | |||
107 : | The ID of the organism being updated. | ||
108 : | |||
109 : | =item ftype | ||
110 : | |||
111 : | The relevant feature type. | ||
112 : | |||
113 : | =item firstNumber | ||
114 : | |||
115 : | The first available ID number for the organism and feature type. | ||
116 : | |||
117 : | =back | ||
118 : | |||
119 : | This file's primary purpose is that it tells us how to create the feature IDs | ||
120 : | for features we'll be adding to the data store, whether it be via a straight | ||
121 : | B<add> or a B<chang> that deletes an old ID and recreates the feature with a | ||
122 : | new ID. | ||
123 : | |||
124 : | If we need new IDs for an organism not listed in this ID file, an error will be | ||
125 : | thrown. | ||
126 : | |||
127 : | =head2 Command-Line Options | ||
128 : | |||
129 : | The command-line options for this script are as follows. | ||
130 : | |||
131 : | =over 4 | ||
132 : | |||
133 : | =item trace | ||
134 : | |||
135 : | Numeric trace level. A higher trace level causes more messages to appear. The | ||
136 : | default trace level is 3. | ||
137 : | |||
138 : | =cut | ||
139 : | |||
140 : | use strict; | ||
141 : | use Tracer; | ||
142 : | use DocUtils; | ||
143 : | use TestUtils; | ||
144 : | use Cwd; | ||
145 : | use File::Copy; | ||
146 : | use File::Path; | ||
147 : | use FIG; | ||
148 : | use Stats; | ||
149 : | |||
150 : | # Get the command-line options. | ||
151 : | my ($options, @parameters) = Tracer::ParseCommand({ trace => 3 }, @ARGV); | ||
152 : | # Set up tracing. | ||
153 : | my $traceLevel = $options->{trace}; | ||
154 : | TSetup("$traceLevel Tracer DocUtils FIG", "TEXT"); | ||
155 : | # Get the FIG object. | ||
156 : | my $fig = FIG->new(); | ||
157 : | # Get the command. | ||
158 : | my $mainCommand = lc shift @parameters; | ||
159 : | Trace("$mainCommand command specified.") if T(2); | ||
160 : | |||
161 : | # Create the ID table. This maps each organism/ftype pair to the currently- | ||
162 : | # available ID number. If we're counting, we leave it empty. If we're not | ||
163 : | # counting, we need to read it in. | ||
164 : | my %idHash = (); | ||
165 : | if ($mainCommand eq 'process') { | ||
166 : | my $inCount = 0; | ||
167 : | Open(\*IDFILE, "<$parameters[1]"); | ||
168 : | while (my $idRecord = <IDFILE>) { | ||
169 : | chomp $idRecord; | ||
170 : | my ($orgID, $ftype, $firstNumber) = split /\t/, $idRecord; | ||
171 : | $idHash{"$orgID.$ftype"} = $firstNumber; | ||
172 : | $inCount++; | ||
173 : | } | ||
174 : | Trace("$inCount ID ranges read in from $parameters[1].") if T(2); | ||
175 : | } | ||
176 : | |||
177 : | # Create some counters we can use for statistical purposes. | ||
178 : | my $stats = Stats->new("genomes", "add", "change", "delete"); | ||
179 : | # Verify that the organism directory exists. | ||
180 : | if (! -d $parameters[0]) { | ||
181 : | Confess("Directory of genome files \"$parameters[0]\" not found."); | ||
182 : | } else { | ||
183 : | # Here we have a valid directory, so we need the list of transaction | ||
184 : | # files in it. | ||
185 : | my $orgsFound = 0; | ||
186 : | my %transFiles = (); | ||
187 : | my @transDirectory = OpenDir($parameters[0], 1); | ||
188 : | # The next step is to create a hash of organism IDs to file names. This | ||
189 : | # saves us some painful parsing later. | ||
190 : | for my $transFileName (@transDirectory) { | ||
191 : | if ($transFileName =~ /^tbl_diff_(\d+\.\d+)$/) { | ||
192 : | $transFiles{$1} = "$parameters[0]/$transFileName"; | ||
193 : | $orgsFound++; | ||
194 : | } | ||
195 : | } | ||
196 : | Trace("$orgsFound genome transaction files found in directory $parameters[0].") if T(2); | ||
197 : | if (! $orgsFound) { | ||
198 : | Confess("No \"tbl_diff\" files found in directory $parameters[1]."); | ||
199 : | } else { | ||
200 : | # Loop through the organisms. | ||
201 : | for my $genomeID (sort keys %transFiles) { | ||
202 : | Trace("Processing changes for $genomeID.") if T(3); | ||
203 : | # Create a statistics object for this organism. | ||
204 : | my $orgStats = Stats->new("add", "change", "delete"); | ||
205 : | # Create a control block for passing around our key data. | ||
206 : | my $controlBlock = { stats => $orgStats, genomeID => $genomeID, | ||
207 : | idHash => \%idHash, options => $options, | ||
208 : | fig => $fig, command => $mainCommand }; | ||
209 : | # Open the organism file. | ||
210 : | my $orgFileName = $transFiles{$genomeID}; | ||
211 : | Open(\*TRANS, "<$orgFileName"); | ||
212 : | my $tranCount = 0; | ||
213 : | # If we're processing rather than counting, open a file for | ||
214 : | # writing out corrected transactions. | ||
215 : | if ($mainCommand eq 'process') { | ||
216 : | Open(\*TRANSOUT, ">$orgFileName.tbl"); | ||
217 : | } | ||
218 : | # Loop through the organism's data. | ||
219 : | while (my $transaction = <TRANS>) { | ||
220 : | # Parse the record. | ||
221 : | chomp $transaction; | ||
222 : | my @fields = split /\t/, $transaction; | ||
223 : | $tranCount++; | ||
224 : | # Save the record number in the control block. | ||
225 : | $controlBlock->{line} = $tranCount; | ||
226 : | # Process according to the transaction type. | ||
227 : | my $command = lc shift @fields; | ||
228 : | if ($command eq 'add') { | ||
229 : | Add($controlBlock, @fields); | ||
230 : | } elsif ($command eq 'delete') { | ||
231 : | Delete($controlBlock, @fields); | ||
232 : | } elsif ($command eq 'change') { | ||
233 : | Change($controlBlock, @fields); | ||
234 : | } else { | ||
235 : | $orgStats->AddMessage("Invalid command $command in line $tranCount for genome $genomeID"); | ||
236 : | } | ||
237 : | $orgStats->Add($command, 1); | ||
238 : | } | ||
239 : | Trace("Statistics for $genomeID\n\n" . $orgStats->Show()) if T(3); | ||
240 : | # Merge the statistics for this run into the globals statistics object. | ||
241 : | $stats->Accumulate($orgStats); | ||
242 : | $stats->Add("genomes", 1); | ||
243 : | # Close the transaction files. | ||
244 : | close TRANS; | ||
245 : | if ($mainCommand eq 'process') { | ||
246 : | close TRANSOUT; | ||
247 : | } | ||
248 : | } | ||
249 : | } | ||
250 : | Trace("Statistics for this run\n\n" . $stats->Show()) if T(1); | ||
251 : | # If we're counting, we need to write out the counts file or allocate IDs | ||
252 : | # from the clearinghouse. | ||
253 : | if ($mainCommand ne "process") { | ||
254 : | # Loop through the ID hash, printing the counts. We will also write them | ||
255 : | # to a file called "counts.tbl". | ||
256 : | my $countfile = "$parameters[0]/counts.tbl"; | ||
257 : | Open(\*COUNTFILE, ">$countfile"); | ||
258 : | print "\nTable of Counts\n"; | ||
259 : | for my $idKey (keys %idHash) { | ||
260 : | $idKey =~ /^(\d+\.\d+)\.([a-z]+)$/; | ||
261 : | my ($org, $ftype) = ($1, $2); | ||
262 : | my $count = $idHash{$idKey}; | ||
263 : | print "$idKey\t$count\n"; | ||
264 : | print COUNTFILE "$org\t$ftype\t$count\n"; | ||
265 : | } | ||
266 : | close COUNTFILE; | ||
267 : | if ($mainCommand eq "register") { | ||
268 : | # Here we are registering as well as counting. This process also produces | ||
269 : | # the ID file. | ||
270 : | Trace("Submitting ID file to clearing house.") if T(2); | ||
271 : | system("register_features_batch <$countfile >$parameters[1]"); | ||
272 : | Trace("Clearing house request complete.") if T(2); | ||
273 : | } | ||
274 : | } | ||
275 : | Trace("Processing complete.") if T(1); | ||
276 : | } | ||
277 : | |||
278 : | =head2 Utility Methods | ||
279 : | |||
280 : | =head3 Add | ||
281 : | |||
282 : | C<< Add($controlBlock, $newID, $locations, $translation); >> | ||
283 : | |||
284 : | Add a new feature to the data store. | ||
285 : | |||
286 : | =over 4 | ||
287 : | |||
288 : | =item controlBlock | ||
289 : | |||
290 : | Reference to a hash containing the data structures required to manage feature | ||
291 : | transactions. | ||
292 : | |||
293 : | =item newID | ||
294 : | |||
295 : | ID to give to the new feature. | ||
296 : | |||
297 : | =item locations | ||
298 : | |||
299 : | Location of the new feature, in the form of a comma-separated list of location | ||
300 : | strings in SEED format. | ||
301 : | |||
302 : | =item translation (optional) | ||
303 : | |||
304 : | Protein translation string for the new feature. If this field is omitted and | ||
305 : | the feature is a peg, the translation will be generated by normal means. | ||
306 : | |||
307 : | =back | ||
308 : | |||
309 : | =cut | ||
310 : | |||
311 : | sub Add { | ||
312 : | my ($controlBlock, $newID, $locations, $translation) = @_; | ||
313 : | my $fig = $controlBlock->{fig}; | ||
314 : | # Extract the feature type and ordinal number from the new ID. | ||
315 : | my ($ftype, $ordinal, $key) = ParseNewID($controlBlock, $newID); | ||
316 : | # If we're counting, we need to count the ID. Otherwise, we need to | ||
317 : | # add the new feature. | ||
318 : | if ($controlBlock->{command} ne 'process') { | ||
319 : | $controlBlock->{idHash}->{$key}++; | ||
320 : | } else { | ||
321 : | # Here we need to add the new feature. | ||
322 : | my $realID = AddFeature($controlBlock, $ordinal, $key, $ftype, | ||
323 : | "", $locations, $translation); | ||
324 : | Trace("Feature $realID added for pseudo-ID $newID.") if T(4); | ||
325 : | # Write a corrected transaction to the transaction output file. | ||
326 : | print TRANSOUT "add\t$realID\t$locations\t$translation\n"; | ||
327 : | } | ||
328 : | } | ||
329 : | |||
330 : | =head3 Change | ||
331 : | |||
332 : | C<< Change($controlBlock, $fid, $newID, $locations, $aliases, $translation); >> | ||
333 : | |||
334 : | Replace a feature to the data store. The feature will be marked for deletion and | ||
335 : | a new feature will be put in its place. | ||
336 : | |||
337 : | This is a much more complicated process than adding a feature. In addition to | ||
338 : | the add, we have to create new aliases and transfer across the assignment and | ||
339 : | the annotations. | ||
340 : | |||
341 : | =over 4 | ||
342 : | |||
343 : | =item controlBlock | ||
344 : | |||
345 : | Reference to a hash containing the data structures required to manage feature | ||
346 : | transactions. | ||
347 : | |||
348 : | =item fid | ||
349 : | |||
350 : | ID of the feature being changed. | ||
351 : | |||
352 : | =item newID | ||
353 : | |||
354 : | New ID to give to the feature. | ||
355 : | |||
356 : | =item locations | ||
357 : | |||
358 : | New location to give to the feature, in the form of a comma-separated list of location | ||
359 : | strings in SEED format. | ||
360 : | |||
361 : | =item aliases (optional) | ||
362 : | |||
363 : | A new list of alias names for the feature. | ||
364 : | |||
365 : | =item translation (optional) | ||
366 : | |||
367 : | New protein translation string for the feature. If this field is omitted and | ||
368 : | the feature is a peg, the translation will be generated by normal means. | ||
369 : | |||
370 : | =back | ||
371 : | |||
372 : | =cut | ||
373 : | |||
374 : | sub Change { | ||
375 : | my ($controlBlock, $fid, $newID, $locations, $aliases, $translation) = @_; | ||
376 : | my $fig = $controlBlock->{fig}; | ||
377 : | # Extract the feature type and ordinal number from the new ID. | ||
378 : | my ($ftype, $ordinal, $key) = ParseNewID($controlBlock, $newID); | ||
379 : | # If we're counting, we need to count the ID. Otherwise, we need to | ||
380 : | # replace the feature. | ||
381 : | if ($controlBlock->{command} ne 'process') { | ||
382 : | $controlBlock->{idHash}->{$key}++; | ||
383 : | } else { | ||
384 : | # Here we can go ahead and change the feature. First, we must | ||
385 : | # get the old feature's assignment and annotations. Note that | ||
386 : | # for the annotations we ask for the time in its raw format. | ||
387 : | my @functions = $fig->function_of($fid); | ||
388 : | my @annotations = $fig->feature_annotations($fid, 1); | ||
389 : | # Create some counters. | ||
390 : | my ($assignCount, $annotateCount) = (0, 0); | ||
391 : | # Add the new version of the feature and get its ID. | ||
392 : | my $realID = AddFeature($controlBlock, $ordinal, $key, $ftype, $locations, | ||
393 : | $aliases, $translation); | ||
394 : | # Copy over the assignments. | ||
395 : | for my $assignment (@functions) { | ||
396 : | my ($user, $function) = @{$assignment}; | ||
397 : | $fig->assign_function($realID, $user, $function); | ||
398 : | $assignCount++; | ||
399 : | } | ||
400 : | # Copy over the annotations. | ||
401 : | for my $annotation (@annotations) { | ||
402 : | my ($oldID, $timestamp, $user, $annotation) = @{$annotation}; | ||
403 : | $fig->add_annotation($realID, $user, $annotation, $timestamp); | ||
404 : | $controlBlock->{stats}->Add("annotation", 1); | ||
405 : | $annotateCount++; | ||
406 : | } | ||
407 : | # Mark the old feature for deletion. | ||
408 : | $fig->delete_feature($fid); | ||
409 : | # Tell the user what we did. | ||
410 : | $controlBlock->{stat}->Add("assignments", $assignCount); | ||
411 : | $controlBlock->{stat}->Add("annotations", $annotateCount); | ||
412 : | Trace("Feature $realID created from $fid. $assignCount assignments and $annotateCount annotations copied.") if T(4); | ||
413 : | # Write a corrected transaction to the transaction output file. | ||
414 : | print TRANSOUT "change\t$fid\t$realID\t$locations\t$aliases\t$translation\n"; | ||
415 : | } | ||
416 : | } | ||
417 : | |||
418 : | =head3 Delete | ||
419 : | |||
420 : | C<< Delete($controlBlock, $fid); >> | ||
421 : | |||
422 : | Delete a feature from the data store. The feature will be marked as deleted, | ||
423 : | which will remove it from consideration by most FIG methods. A garbage | ||
424 : | collection job will be run later to permanently delete the feature. | ||
425 : | |||
426 : | =over 4 | ||
427 : | |||
428 : | =item controlBlock | ||
429 : | |||
430 : | Reference to a hash containing the data structures required to manage feature | ||
431 : | transactions. | ||
432 : | |||
433 : | =item fid | ||
434 : | |||
435 : | ID of the feature to delete. | ||
436 : | |||
437 : | =back | ||
438 : | |||
439 : | =cut | ||
440 : | |||
441 : | sub Delete { | ||
442 : | my ($controlBlock, $fid) = @_; | ||
443 : | my $fig = $controlBlock->{fig}; | ||
444 : | # Extract the feature type and count it. | ||
445 : | my $ftype = FIG::ftype($fid); | ||
446 : | $controlBlock->{stats}->Add($ftype, 1); | ||
447 : | # If we're not counting, delete the feature. | ||
448 : | if ($controlBlock->{command} eq 'process') { | ||
449 : | # Mark the feature for deletion. | ||
450 : | $fig->delete_feature($fid); | ||
451 : | # Echo the transaction to the transaction output file. | ||
452 : | print TRANSOUT "del\t$fid\n"; | ||
453 : | } | ||
454 : | } | ||
455 : | |||
456 : | =head3 ParseNewID | ||
457 : | |||
458 : | C<< my ($ftype, $ordinal, $key) = ParseNewID($controlBlock, $newID); >> | ||
459 : | |||
460 : | Extract the feature type and ordinal number from an incoming new ID. | ||
461 : | |||
462 : | =over 4 | ||
463 : | |||
464 : | =item controlBlock | ||
465 : | |||
466 : | Reference to a hash containing the data structures needed to manage transactions. | ||
467 : | |||
468 : | =item newID | ||
469 : | |||
470 : | New ID specification taken from a transaction input record. This contains the | ||
471 : | feature type followed by a period and then the ordinal number of the ID. | ||
472 : | |||
473 : | =item RETURN | ||
474 : | |||
475 : | Returna a three-element list. If successful, the list will contain the feature | ||
476 : | type followed by the ordinal number and the key to use in the ID hash to find | ||
477 : | the feature's true ID number. If the incoming ID is invalid, the list | ||
478 : | will contain three C<undef>s. | ||
479 : | |||
480 : | =back | ||
481 : | |||
482 : | =cut | ||
483 : | |||
484 : | sub ParseNewID { | ||
485 : | # Get the parameters. | ||
486 : | my ($controlBlock, $newID) = @_; | ||
487 : | my ($ftype, $ordinal, $key); | ||
488 : | # Parse the ID. | ||
489 : | if ($newID =~ /^([a-z]+)\.(\d+)$/) { | ||
490 : | # Here we have a valid ID. | ||
491 : | ($ftype, $ordinal) = ($1, $2); | ||
492 : | $key = $controlBlock->{genomeID} . ".$ftype"; | ||
493 : | # Update the feature type count in the statistics. | ||
494 : | $controlBlock->{stats}->Add($ftype, 1); | ||
495 : | } else { | ||
496 : | # Here we have an invalid ID. | ||
497 : | $controlBlock->{stats}->AddMessage("Invalid ID $newID found in line " . | ||
498 : | $controlBlock->{line} . " for genome " . | ||
499 : | $controlBlock->{genomeID} . "."); | ||
500 : | } | ||
501 : | # Return the result. | ||
502 : | return ($ftype, $ordinal, $key); | ||
503 : | } | ||
504 : | |||
505 : | =head3 GetRealID | ||
506 : | |||
507 : | C<< my $realID = GetRealID($controlBlock, $ftype, $ordinal, $key); >> | ||
508 : | |||
509 : | Compute the real ID of a new feature. This involves interrogating the ID hash and | ||
510 : | formatting a full-blown ID out of little bits of information. | ||
511 : | |||
512 : | =over 4 | ||
513 : | |||
514 : | =item controlBlock | ||
515 : | |||
516 : | Reference to a hash containing data used to manage the transaction process. | ||
517 : | |||
518 : | =item ordinal | ||
519 : | |||
520 : | Zero-based ordinal number of this feature. The ordinal number is added to the value | ||
521 : | stored in the control block's ID hash to compute the real feature number. | ||
522 : | |||
523 : | =item key | ||
524 : | |||
525 : | Key in the ID hash relevant to this feature. | ||
526 : | |||
527 : | =item RETURN | ||
528 : | |||
529 : | Returns a fully-formatted FIG ID for the new feature. | ||
530 : | |||
531 : | =back | ||
532 : | |||
533 : | =cut | ||
534 : | |||
535 : | sub GetRealID { | ||
536 : | # Get the parameters. | ||
537 : | my ($controlBlock, $ordinal, $key) = @_; | ||
538 : | #Declare the return value. | ||
539 : | my $retVal; | ||
540 : | # Get the base value for the feature ID number. | ||
541 : | my $base = $controlBlock->{idHash}->{$key}; | ||
542 : | # If it didn't exist, we have an error. | ||
543 : | if (! defined $base) { | ||
544 : | Confess("No ID range found for genome ID and feature type $key."); | ||
545 : | } else { | ||
546 : | # Now we have enough data to format the ID. | ||
547 : | my $num = $base + $ordinal; | ||
548 : | $retVal = "fig|$key.$num"; | ||
549 : | } | ||
550 : | # Return the result. | ||
551 : | return $retVal; | ||
552 : | } | ||
553 : | |||
554 : | =head3 CheckTranslation | ||
555 : | |||
556 : | C<< my $actualTranslation = CheckTranslation($controlBlock, $ftype, $locations, $translation); >> | ||
557 : | |||
558 : | If we are processing a PEG, insure we have a translation for the peg's locations. | ||
559 : | |||
560 : | This method checks the feature type and the incoming translation string. If the | ||
561 : | translation string is empty and the feature type is C<peg>, it will generate | ||
562 : | a translation string using the specified locations for the genome currently | ||
563 : | being processed. | ||
564 : | |||
565 : | =over 4 | ||
566 : | |||
567 : | =item controlBlock | ||
568 : | |||
569 : | Reference to a hash containing data used to manage the transaction process. | ||
570 : | |||
571 : | =item ftype | ||
572 : | |||
573 : | Feature type (C<peg>, C<rna>, etc.) | ||
574 : | |||
575 : | =item locations | ||
576 : | |||
577 : | Comma-delimited list of location strings for the feature in question. | ||
578 : | |||
579 : | =item translation (optional) | ||
580 : | |||
581 : | If specified, will be returned to the caller as the result. | ||
582 : | |||
583 : | =item RETURN | ||
584 : | |||
585 : | Returns the protein translation string for the specified locations, or C<undef> | ||
586 : | if no translation is warranted. | ||
587 : | |||
588 : | =back | ||
589 : | |||
590 : | =cut | ||
591 : | |||
592 : | sub CheckTranslation { | ||
593 : | # Get the parameters. | ||
594 : | my ($controlBlock, $ftype, $locations, $translation) = @_; | ||
595 : | my $fig = $controlBlock->{fig}; | ||
596 : | # Declare the return variable. | ||
597 : | my $retVal; | ||
598 : | if ($ftype eq 'peg') { | ||
599 : | # Here we have a protein encoding gene. Check to see if we already have | ||
600 : | # a translation. | ||
601 : | if (defined $translation) { | ||
602 : | # Pass it back unmodified. | ||
603 : | $retVal = $translation; | ||
604 : | } else { | ||
605 : | # Here we need to compute the translation. | ||
606 : | my $dna = $fig->dna_seq($controlBlock->{genomeID}, $locations); | ||
607 : | $retVal = FIG::translate($dna); | ||
608 : | } | ||
609 : | } | ||
610 : | # Return the result. | ||
611 : | return $retVal; | ||
612 : | } | ||
613 : | |||
614 : | =head3 AddFeature | ||
615 : | |||
616 : | C<< my $realID = AddFeature($controlBlock, $ordinal, $key, $ftype, $locations, $translation); >> | ||
617 : | |||
618 : | Add the specified feature to the FIG data store. This involves generating the new feature's | ||
619 : | ID, creating the translation (if needed), adding the feature to the data store, and | ||
620 : | queueing a request to update the similarities. The generated ID will be returned to the | ||
621 : | caller. | ||
622 : | |||
623 : | =over 4 | ||
624 : | |||
625 : | =item controlBlock | ||
626 : | |||
627 : | Reference to a hash containing the data structures required to manage feature | ||
628 : | transactions. | ||
629 : | |||
630 : | =item ordinal | ||
631 : | |||
632 : | Zero-based ordinal number of the proposed feature in the ID space. This is added to the | ||
633 : | base ID number to get the real ID number. | ||
634 : | |||
635 : | =item key | ||
636 : | |||
637 : | Key to use for getting the base ID number from the ID hash. | ||
638 : | |||
639 : | =item ftype | ||
640 : | |||
641 : | Proposed feature type (C<peg>, C<rna>, etc.) | ||
642 : | |||
643 : | =item locations | ||
644 : | |||
645 : | Location of the new feature, in the form of a comma-separated list of location | ||
646 : | strings in SEED format. | ||
647 : | |||
648 : | =item aliases (optional) | ||
649 : | |||
650 : | A new list of alias names for the feature. | ||
651 : | |||
652 : | =item translation (optional) | ||
653 : | |||
654 : | Protein translation string for the new feature. If this field is omitted and | ||
655 : | the feature is a peg, the translation will be generated by normal means. | ||
656 : | |||
657 : | =back | ||
658 : | |||
659 : | =cut | ||
660 : | |||
661 : | sub AddFeature { | ||
662 : | # Get the parameters. | ||
663 : | my ($controlBlock, $ordinal, $key, $ftype, $locations, $aliases, $translation) = @_; | ||
664 : | my $fig = $controlBlock->{fig}; | ||
665 : | # We want to add a new feature using the information provided. First, we | ||
666 : | # generate its ID. | ||
667 : | my $retVal = GetRealID($controlBlock, $ordinal, $key); | ||
668 : | # Next, we insure that we have a translation. | ||
669 : | my $actualTranslation = CheckTranslation($controlBlock, $ftype, | ||
670 : | $locations, $translation); | ||
671 : | # Now we add it to FIG. | ||
672 : | $fig->add_feature($controlBlock->{genomeID}, $ftype, $locations, "", | ||
673 : | $actualTranslation, $retVal); | ||
674 : | # Tell FIG to recompute the similarities. | ||
675 : | $fig->enqueue_similarities([$retVal]); | ||
676 : | # Return the ID we generated. | ||
677 : | return $retVal; | ||
678 : | } | ||
679 : | |||
680 : | 1; |
MCS Webmaster | ViewVC Help |
Powered by ViewVC 1.0.3 |