[Bio] / FigKernelPackages / FullLocation.pm Repository:
ViewVC logotype

Annotation of /FigKernelPackages/FullLocation.pm

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.24 - (view) (download) (as text)

1 : parrello 1.1 #!/usr/bin/perl -w
2 : olson 1.10 #
3 :     # Copyright (c) 2003-2006 University of Chicago and Fellowship
4 :     # for Interpretations of Genomes. All Rights Reserved.
5 :     #
6 :     # This file is part of the SEED Toolkit.
7 : parrello 1.12 #
8 : olson 1.10 # The SEED Toolkit is free software. You can redistribute
9 :     # it and/or modify it under the terms of the SEED Toolkit
10 : parrello 1.12 # Public License.
11 : olson 1.10 #
12 :     # You should have received a copy of the SEED Toolkit Public License
13 :     # along with this program; if not write to the University of Chicago
14 :     # at info@ci.uchicago.edu or the Fellowship for Interpretation of
15 :     # Genomes at veronika@thefig.info or download a copy from
16 :     # http://www.theseed.org/LICENSE.TXT.
17 :     #
18 :    
19 : parrello 1.1
20 :     package FullLocation;
21 :    
22 :     use strict;
23 :     use Tracer;
24 :     use PageBuilder;
25 :     use BasicLocation;
26 :    
27 :     =head1 Full Location Object
28 :    
29 :     =head2 Introduction
30 :    
31 :     A I<full location object> describes a list of basic locations (segments) in a
32 :     particular genome. In addition to an array of basic location objects, it contains
33 :     a genome ID and a reference to a FIG-like object. The FIG-like object is always
34 :     accessed using the variable I<$fig>. This simplifies the process of determining which
35 :     FIG methods must be supported in order to make use of this object's features.
36 :    
37 :     The simplest way to create a full location object is by passing in the genome ID
38 :     and a location string. The location string contains a list of basic locations
39 :     separated by commas. These are converted into location objects and assembled into
40 :     the full location. The full location is considered I<bounded> by the first and last
41 :     basic locations in the list. This bounded region has a B<Left>, B<Right>, B<Begin>, and
42 :     B<Endpoint>, just like a basic location. So, for example, with a location list of
43 :    
44 :     RED_100_250, RED_275_325, RED_330_430
45 :    
46 :     the B<Left> and B<Begin> locations are C<RED_100>, while the B<Right> and B<EndPoint>
47 : parrello 1.23 locations are C<RED_430>. Similarly, with the location list
48 : parrello 1.1
49 :     BLUE_500_450, BLUE_425_300, BLUE_295_200
50 :    
51 : parrello 1.23 the B<Left> and B<EndPoint> locations are C<BLUE_200>, while the B<Right> and B<Begin>
52 : parrello 1.1 locations are C<BLUE_500>.
53 :    
54 :     A location can be converted to a DNA string using the genome ID and data accessible
55 :     through the fig-like object. A location also has a I<translation> that represents the
56 :     protein sequence produced DNA. The translation can be computed from the DNA or can be
57 :     provided by the client. If it is provided by the client, it will automatically be
58 :     extended when the boundaries are moved.
59 :    
60 :     Theoretically, a location can contain basic locations on different contigs and
61 :     pointing in different directions. In practice, this does not occur; therefore,
62 :     the location will be treated as a set of basic locations in a single direction
63 :     on a single contig. If this is not the case, problems will arise with some of
64 :     the methods.
65 :    
66 :     =cut
67 :    
68 :     #: Constructor FullLocation->new();
69 :    
70 :     =head2 Public Methods
71 :    
72 :     =head3 new
73 :    
74 : parrello 1.18 my $loc = FullLocation->new($fig, $genomeID, $locList, $translation, $code);
75 : parrello 1.1
76 :     Construct a new FullLocation object.
77 :    
78 :     =over 4
79 :    
80 :     =item fig
81 :    
82 :     A fig-like object that can be used to get the DNA and translation information.
83 :    
84 :     =item genomeID
85 :    
86 :     ID of the genome containing the location.
87 :    
88 :     =item locList
89 :    
90 :     List of locations. This can be a reference to a list of location strings, a
91 :     comma-delimited list of location strings, or a reference to a list of basic
92 :     location objects.
93 :    
94 :     =item translation (optional)
95 :    
96 :     Protein string representing the DNA inside the boundaries of the location.
97 :    
98 : parrello 1.14 =item code (optional)
99 :    
100 :     Translation code table to use for translating DNA triplets to proteins. If
101 :     none is specified, the standard code table will be used. If a number is
102 :     specified, then the appropriate NCBI code table will be requested from
103 : parrello 1.22 [[FigPm]].
104 : parrello 1.14
105 : parrello 1.1 =back
106 :    
107 :     =cut
108 :    
109 :     sub new {
110 :     # Get the parameters.
111 : parrello 1.14 my ($class, $fig, $genomeID, $locList, $translation, $code) = @_;
112 :     # If there's no code, default to the standard translation code.
113 :     if (! defined $code) {
114 :     $code = FIG::standard_genetic_code();
115 :     } elsif (! ref $code) {
116 :     $code = FIG::genetic_code($code);
117 :     }
118 : parrello 1.1 # Create the $loc object.
119 :     my $retVal = {
120 :     fig => $fig,
121 :     genomeID => $genomeID,
122 : parrello 1.14 translation => $translation,
123 :     code => $code
124 : parrello 1.1 };
125 :     # The tricky part is the location list. Regardless of its incoming format,
126 :     # we must convert it to a list of location objects.
127 :     my $locType = ref $locList;
128 :     if ($locType eq '') {
129 :     # Here we have a comma-delimited list of locations, so we convert it to
130 :     # an array reference.
131 :     $retVal->{locs} = _ParseLocations($retVal, split /\s*,\s*/, $locList);
132 :     } elsif ($locType eq 'ARRAY') {
133 :     # Here we have an array of location objects or strings, which we can
134 :     # pass directly to the parser.
135 :     $retVal->{locs} = _ParseLocations($retVal, @{$locList});
136 :     } else {
137 :     Confess("Invalid location list parameter of type $locType.");
138 :     }
139 :     # Scan the location array to determine the contig and direction. We choose
140 :     # the most popular contig and direction to represent all the locations.
141 :     my %contigs = ();
142 :     my %dirs = ( '+' => 0, '-' => 0 );
143 :     for my $loc (@{$retVal->{locs}}) {
144 :     $dirs{$loc->Dir}++;
145 :     $contigs{$loc->Contig}++
146 :     }
147 :     # Choose the most popular direction and contig.
148 :     $retVal->{dir} = GetBest(%dirs);
149 :     $retVal->{contig} = GetBest(%contigs);
150 :     if ($dirs{$retVal->{dir}} == 0) {
151 :     # Here the location list was empty.
152 :     Confess("Attempt to create a location for genome $genomeID that has no locations.");
153 :     }
154 :     # Bless and return the object.
155 :     bless $retVal, $class;
156 :     return $retVal;
157 :     }
158 :    
159 :     =head3 Locs
160 :    
161 : parrello 1.12 my $locObject = $loc->Locs->[$idx];
162 : parrello 1.1
163 :     Return a reference to the array of location objects.
164 :    
165 : parrello 1.13 B<NOTE>: Do not update the locations, as it will mess up the translation. Use the
166 :     L</Extend> method to change a full location's begin and/or end points.
167 :    
168 : parrello 1.1 =cut
169 :     #: Return Type $@;
170 :     sub Locs {
171 :     return $_[0]->{locs};
172 :     }
173 :    
174 :     =head3 Contig
175 :    
176 : parrello 1.12 my $contigID = $loc->Contig();
177 : parrello 1.1
178 :     Return the ID of the base contig for this location list. The base contig is the
179 :     contig used for most of the locations in the list.
180 :    
181 :     =cut
182 :     #: Return Type $;
183 :     sub Contig {
184 :     # Get the parameters.
185 :     my ($self) = @_;
186 :     # Return the result.
187 :     return $self->{contig};
188 :     }
189 :    
190 :     =head3 Dir
191 :    
192 : parrello 1.12 my $dir = $loc->Dir;
193 : parrello 1.1
194 :     Return the base direction for this location list. The base direction is the direction
195 :     (C<+> or C<->) used for most of the locations in the list.
196 :    
197 :     =cut
198 :     #: Return Type $;
199 :     sub Dir {
200 :     # Get the parameters.
201 :     my ($self) = @_;
202 :     # Return the result.
203 :     return $self->{dir};
204 :     }
205 :    
206 : parrello 1.2 =head3 NextPoint
207 :    
208 : parrello 1.12 my $offset = $loc->NextPoint;
209 : parrello 1.2
210 :     Return the location immediately after the end point of the last location.
211 :    
212 :     =cut
213 :     #: Return Type $;
214 :     sub NextPoint {
215 :     # Get the parameters.
216 :     my ($self) = @_;
217 :     # Get the last location.
218 :     my (undef, undef, $locN) = $self->GetBounds();
219 :     # Return the point after it.
220 :     return $locN->PointOffset($locN->Length);
221 :     }
222 :    
223 :     =head3 PrevPoint
224 :    
225 : parrello 1.12 my $offset = $loc->PrevPoint;
226 : parrello 1.2
227 :     Return the location immediately before the begin point of the first location.
228 :    
229 :     =cut
230 :     #: Return Type $;
231 :     sub PrevPoint {
232 :     # Get the parameters.
233 :     my ($self) = @_;
234 :     # Get the last location.
235 :     my (undef, $loc1, undef) = $self->GetBounds();
236 :     # Return the point after it.
237 :     return $loc1->PointOffset(-1);
238 :     }
239 :    
240 :     =head3 Begin
241 :    
242 : parrello 1.12 my $offset = $loc->Begin;
243 : parrello 1.2
244 :     Return the begin point of the first location.
245 :    
246 :     =cut
247 :     #: Return Type $;
248 :     sub Begin {
249 :     # Get the parameters.
250 :     my ($self) = @_;
251 : parrello 1.4 # Get the first location.
252 : parrello 1.2 my (undef, $loc1, undef) = $self->GetBounds();
253 :     # Return its begin point.
254 :     return $loc1->Begin;
255 :     }
256 :    
257 : parrello 1.4 =head3 EndPoint
258 :    
259 : parrello 1.12 my $offset = $loc->EndPoint;
260 : parrello 1.4
261 :     Return the end point of the last location.
262 :    
263 :     =cut
264 :     #: Return Type $;
265 :     sub EndPoint {
266 :     # Get the parameters.
267 :     my ($self) = @_;
268 :     # Get the last location.
269 :     my (undef, undef, $locN) = $self->GetBounds();
270 :     # Return its end point.
271 :     return $locN->EndPoint;
272 :     }
273 :    
274 : parrello 1.2 =head3 SeedString
275 :    
276 : parrello 1.12 my $string = $loc->SeedString;
277 : parrello 1.2
278 :     Return a comma-delimited list of this object's basic locations, in SEED format.
279 :    
280 :     =cut
281 :     #: Return Type $;
282 :     sub SeedString {
283 :     # Get the parameters.
284 :     my ($self) = @_;
285 :     # Map the location list to SEED strings.
286 :     my @seeds = map { $_->SeedString } @{$self->{locs}};
287 :     # Return the result.
288 :     return join ", ", @seeds;
289 :     }
290 :    
291 :     =head3 Adjusted
292 :    
293 : parrello 1.12 my $offset = $loc->Adjusted($oldOffset, $distance);
294 : parrello 1.2
295 :     Adjust the specified offset by the specified distance in the direction of this
296 :     location. If this is a forward location, the distance is added; if it is a backward
297 :     location, the distance is subtracted.
298 :    
299 :     =over 4
300 :    
301 :     =item oldOffset
302 :    
303 :     Offset to adjust.
304 :    
305 :     =item distance
306 :    
307 :     Distance by which to adjust the offset. This value can be negative.
308 :    
309 :     =item RETURN
310 :    
311 :     Returns a new offset formed by moving the specified distance from the original offset
312 :     in this location's direction.
313 :    
314 :     =back
315 :    
316 :     =cut
317 :     #: Return Type $;
318 :     sub Adjusted {
319 :     # Get the parameters.
320 :     my ($self, $oldOffset, $distance) = @_;
321 :     # Do the adjustment.
322 :     return $oldOffset + ($self->Dir eq '+' ? $distance : -$distance);
323 :     }
324 :    
325 : parrello 1.1 =head3 GetBest
326 :    
327 : parrello 1.12 my $bestKey = FullLocation::GetBest(%hash);
328 : parrello 1.1
329 :     Return the key of the hash element with the highest positive numeric value.
330 :    
331 :     =over 4
332 :    
333 :     =item hash
334 :    
335 :     A hash mapping keys to numbers.
336 :    
337 :     =item RETURN
338 :    
339 :     Returns the key having the highest value. If the hash is empty, or has no non-negative
340 : overbeek 1.6 values, returns C<undef>.
341 : parrello 1.1
342 :     =back
343 :    
344 :     =cut
345 :     #: Return Type $;
346 :     sub GetBest {
347 :     # Get the parameters.
348 :     my (%hash) = @_;
349 :     # Declare the return variable and initialize the best-count.
350 :     my ($retVal, $best) = (undef, 0);
351 :     # Search the hash.
352 :     for my $key (keys %hash) {
353 :     my $value = $hash{$key};
354 :     if ($value >= $best) {
355 :     $retVal = $key;
356 :     $best = $value;
357 :     }
358 :     }
359 :     # Return the result.
360 :     return $retVal;
361 :     }
362 :    
363 :     =head3 DNA
364 :    
365 : parrello 1.12 my $dnaString = $loc->DNA;
366 : parrello 1.1
367 :     Return the complete DNA string for this location.
368 :    
369 :     =cut
370 :     #: Return Type $;
371 :     sub DNA {
372 :     # Get the parameters.
373 :     my ($self) = @_;
374 :     my $fig = $self->{fig};
375 :     # Use the FIG object to extract the DNA.
376 :     my $retVal = $fig->dna_seq($self->{genomeID}, $self->Contig, map { $_->SeedString } @{$self->Locs});
377 :     # Return the result.
378 :     return $retVal;
379 :     }
380 :    
381 : parrello 1.2 =head3 Codon
382 :    
383 : parrello 1.12 my $codon = $loc->Codon($point);
384 : parrello 1.2
385 :     Return the DNA codon at the specified point on this location's contig in this
386 :     location's direction.
387 :    
388 :     =over 4
389 :    
390 :     =item point
391 :    
392 :     Offset into the contig of the codon.
393 :    
394 :     =item RETURN
395 :    
396 :     Returns a three-letter DNA codon from the specified point.
397 :    
398 :     =back
399 :    
400 :     =cut
401 :     #: Return Type $;
402 :     sub Codon {
403 :     # Get the parameters.
404 :     my ($self, $point) = @_;
405 :     # Get the FIG object.
406 :     my $fig = $self->{fig};
407 :     # Compute the codon location.
408 :     my $loc = $self->Contig . "_" . $point . "_" . $self->Adjusted($point,2);
409 :     # Return the DNA.
410 :     return $fig->dna_seq($self->{genomeID}, $loc);
411 :     }
412 :    
413 : parrello 1.1 =head3 Translation
414 :    
415 : parrello 1.12 my $proteinString = $loc->Translation($code, $fixStart);
416 : parrello 1.1
417 :     Return the protein translation of this location's DNA. The first time a
418 :     translation is requested, it will be cached in the object, and returned
419 :     unmodified. It is also possible that a translation specified in the
420 :     constructor exists, in which case it will be returned. Thus, the
421 :     I<$code> and I<$fixStart> parameters only matter on the first call.
422 :    
423 :     =over 4
424 :    
425 :     =item code (optional)
426 :    
427 :     Translation code table, in the form of a hash mapping DNA triples to protein
428 :     letters. If omitted, the standard translation table in C<FIG.pm> will be
429 :     used.
430 :    
431 :     =item fixStart (optional)
432 :    
433 :     TRUE if the first DNA triple should be given special handling, else FALSE.
434 :     If TRUE, then a value of C<TTG> or C<GTG> in the first position will be
435 :     translated to C<M> instead of the value specified in the translation code.
436 :    
437 :     =item RETURN
438 :    
439 :     Returns the protein translation for this location.
440 :    
441 :     =back
442 :    
443 :     =cut
444 :     #: Return Type $;
445 :     sub Translation {
446 :     # Get the parameters.
447 :     my ($self, $code, $fixStart) = @_;
448 :     # Declare the return variable.
449 :     my $retVal;
450 :     # Check for a cached translation.
451 :     if ($self->{translation}) {
452 :     # Return the cahced translation.
453 :     $retVal = $self->{translation};
454 :     } else {
455 : parrello 1.14 # Check for a translation code.
456 :     if (! defined $code) {
457 : olson 1.24 #
458 :     # Don't default to the standard one, you very carefully
459 :     # created one in the constructor. Why not use it.
460 :     #
461 :     $code = $self->{code};
462 : parrello 1.14 }
463 : parrello 1.1 # Here we have to do some work. Extract our DNA.
464 :     my $dna = $self->DNA;
465 :     # Translate it.
466 :     $retVal = FIG::translate($dna, $code, $fixStart);
467 : parrello 1.14 # Chop off the stop codon.
468 :     $retVal =~ s/\*$//;
469 : olson 1.24
470 : parrello 1.14 # Cache the translation and the code table.
471 : olson 1.24 #
472 :     # This probably should be cached based on the code table used. It maybe
473 :     # shouldn't even be cached at all.
474 : parrello 1.1 $self->{translation} = $retVal;
475 : olson 1.24
476 :     #
477 :     # Why would you cache this; it was probably passed in as an exceptional case.
478 :     #$self->{code} = $code;
479 : parrello 1.1 }
480 :     # Return the result.
481 :     return $retVal;
482 :     }
483 :    
484 : parrello 1.23 =head3 ConstrainPoint
485 : parrello 1.9
486 : parrello 1.23 my $constrainedPoint = $loc->ConstrainPoint($point);
487 : parrello 1.9
488 :     Change a point location value so that it fits inside the base contig. If the point
489 :     location is less than 1, it will be set to 1. If it's greater than the length of
490 :     the contig, it will be set to the length of the contig.
491 :    
492 :     =over 4
493 :    
494 :     =item point
495 :    
496 :     Location to be constrained.
497 :    
498 :     =item RETURN
499 :    
500 :     Returns the value of the nearest location that is on the base contig of this location.
501 :    
502 :     =back
503 :    
504 :     =cut
505 :     #: Return Type $;
506 : parrello 1.23 sub ConstrainPoint {
507 : parrello 1.9 # Get the parameters.
508 :     my ($self, $point) = @_;
509 :     # Declare the return variable.
510 :     my $retVal;
511 :     # Check for a value less than 1.
512 :     if ($point < 1) {
513 :     $retVal = 1;
514 :     } else {
515 :     # Check for a value off the end of the contig.
516 :     my $fig = $self->{fig};
517 :     my $contigEnd = $fig->contig_ln($self->{genomeID}, $self->Contig);
518 :     if ($point > $contigEnd) {
519 :     # Bring the point back onto the contig.
520 :     $retVal = $contigEnd;
521 :     } else {
522 :     # Return the incoming point unmodified.
523 :     $retVal = $point;
524 :     }
525 :     }
526 :     # Return the result.
527 :     return $retVal;
528 :     }
529 :    
530 : parrello 1.3 =head3 ExtremeCodon
531 :    
532 : parrello 1.12 my $loc->ExtremeCodon($dir);
533 : parrello 1.3
534 :     Return the most extreme codon in the specified direction. This is not always the most
535 :     extreme location, since the distance to the appropriate edge of the location must be
536 :     a multiple of 3.
537 :    
538 :     =over 4
539 :    
540 :     =item dir
541 :    
542 :     C<first> to get the codon moving away from the beginning of the location, C<last>
543 :     to get the codon moving away from the end of the location.
544 :    
545 :     =item RETURN
546 :    
547 : parrello 1.4 Returns the edge location of the desired codon. If we are going toward the left, this
548 :     is the left point in the codon; if we are going toward the right, this is the right
549 :     point in the codon.
550 : parrello 1.3
551 :     =back
552 :    
553 :     =cut
554 :    
555 :     sub ExtremeCodon {
556 :     # Get the parameters.
557 :     my ($self, $dir) = @_;
558 :     my $fig = $self->{fig};
559 :     # The first task is to determine the starting point and direction for the
560 :     # search. We start by converting the direction to the same format as the
561 :     # location direction.
562 :     my $parity = ($dir eq 'first' ? '-' : '+');
563 : parrello 1.4 # Get the contig length.
564 :     my $contig_len = $fig->contig_ln($self->{genomeID}, $self->Contig);
565 : parrello 1.8 # If we're moving in the opposite direction from the location, we're going to
566 : parrello 1.3 # go toward the beginning of the contig; otherwise, we're going toward the
567 :     # end.
568 :     my ($multiplier, $endPoint);
569 :     if ($parity ne $self->Dir) {
570 :     ($multiplier, $endPoint) = (-3, 1);
571 :     } else {
572 : parrello 1.4 ($multiplier, $endPoint) = (3, $contig_len);
573 : parrello 1.3 }
574 :     # Now we need the start point, which is determined by direction of this method.
575 : parrello 1.4 my $beginPoint = ($parity eq '-' ? $self->Begin : $self->EndPoint);
576 : parrello 1.3 # Compute the number of positions to move and add it to the begin point.
577 :     my $retVal = int(($endPoint - $beginPoint) / $multiplier) * $multiplier +
578 :     $beginPoint;
579 :     # Return the codon found.
580 :     return $retVal;
581 :     }
582 :    
583 : parrello 1.13
584 : parrello 1.2 =head3 Extend
585 :    
586 : parrello 1.12 my = $loc->Extend($newBegin, $newEnd, $trimFlag);
587 : parrello 1.2
588 :     Extend this gene to a new begin point and a new end point. If a translation exists,
589 : parrello 1.5 it will be updated to match the new locations. The I<$trimFlag> indicates whether
590 :     or not it is permissible to shrink the location at either end. If an attempt is made
591 :     to shrink and I<$trimFlag> is not specified, then a fatal error will occur.
592 : parrello 1.2
593 :     =over 4
594 :    
595 :     =item newBegin
596 :    
597 : parrello 1.13 Proposed new beginning offset. If undefined, the begin location will not be changed.
598 :    
599 :     =item newEnd
600 : parrello 1.2
601 : parrello 1.13 Proposed new ending offset. If undefined, the ending location will not be changed.
602 : parrello 1.2
603 : parrello 1.13 =item trimFlag
604 :    
605 :     If TRUE, the begin and end offsets can shrink the location.
606 : parrello 1.2
607 :     =back
608 :    
609 :     =cut
610 :     #: Return Type ;
611 :     sub Extend {
612 :     # Get the parameters.
613 : parrello 1.5 my ($self, $newBegin, $newEnd, $trimFlag) = @_;
614 : parrello 1.2 my $fig = $self->{fig};
615 :     # Get our boundaries.
616 :     my ($boundLoc, $loc1, $locN) = $self->GetBounds;
617 :     # Get the current length of the start location.
618 : parrello 1.5 my $len = $boundLoc->Length;
619 :     # Extend the beginning of the bounds.
620 : parrello 1.13 if (defined $newBegin) {
621 :     $boundLoc->SetBegin($newBegin);
622 :     my $excess = $boundLoc->Length - $len;
623 :     # Insure this is a real extension.
624 :     if ($excess < 0) {
625 :     # Find out if we can trim.
626 :     if (! $trimFlag) {
627 :     # We can't trim, so it's an error.
628 :     Confess("Invalid begin location $newBegin for location " . $boundLoc->String . ".");
629 :     } elsif ($self->{translation}) {
630 :     # We can trim, and a translation exists, so we lop some characters off
631 :     # the front. Note that we divide by -3 because if we're here, the excess
632 :     # is automatically negative.
633 :     my $proteinExcess = int($excess / -3);
634 :     $self->{translation} = substr $self->{translation}, $proteinExcess;
635 :     }
636 :     } elsif ($excess > 0 && $self->{translation}) {
637 :     # Here we have new stuff to translate. Get its location.
638 :     my $excessLoc = BasicLocation->new($loc1->Contig, $newBegin, $loc1->Dir, $excess);
639 :     # Extract its DNA and translate it.
640 :     my $newDNA = $fig->dna_seq($self->{genomeID}, $excessLoc->SeedString);
641 : parrello 1.14 my $newTran = FIG::translate($newDNA, $self->{code});
642 : parrello 1.13 # Prefix the new translation to the old one.
643 :     $self->{translation} = $newTran . $self->{translation};
644 :     }
645 :     # We successfully updated the translation (if necessary), so we adjust the
646 :     # start of the first location in the full location's list.
647 :     $loc1->SetBegin($newBegin);
648 :     # Get the new current length of the bounds location.
649 :     $len = $boundLoc->Length;
650 :     }
651 :     # Extend the ending.
652 :     if (defined $newEnd) {
653 :     $boundLoc->SetEnd($newEnd);
654 :     my $excess = $boundLoc->Length - $len;
655 :     # Insure this is a real extension.
656 :     if ($excess < 0) {
657 :     # Find out if we can trim.
658 :     if (! $trimFlag) {
659 :     # We can't trim, so it's an error.
660 :     Confess("Invalid end location $newEnd for location " . $boundLoc->String . ".");
661 :     } elsif ($self->{translation}) {
662 :     # We can trim, and a translation exists, so we lop off some proteins at
663 :     # the end. Note that we divide by 3 and the excess is negative, so the
664 :     # result will be negative. We use it as a negative length in the substr
665 :     # expression to trim end characters.
666 :     my $negativeProteinExcess = int($excess / 3);
667 :     $self->{translation} = substr $self->{translation}, 0, $negativeProteinExcess;
668 :     }
669 :     } elsif ($excess > 0 && $self->{translation}) {
670 :     # Here we have new stuff to translate. Get its location.
671 :     my $excessLoc = BasicLocation->new($locN->Contig, $boundLoc->PointOffset($len), $locN->Dir,
672 :     $excess);
673 :     # Extract its DNA and translate it.
674 :     my $newDNA = $fig->dna_seq($self->{genomeID}, $excessLoc->SeedString);
675 : parrello 1.14 my $newTran = FIG::translate($newDNA, $self->{code});
676 : parrello 1.13 # Append the new translation to the old one.
677 :     $self->{translation} .= $newTran;
678 :     }
679 :     # Here we sucessfully updated the translation and the update is legal, so
680 :     # we can modify the end of the last location.
681 :     $locN->SetEnd($newEnd);
682 :     }
683 : parrello 1.15 if ($self->{translation}) {
684 :     # Chop the stop codon off the end of the translation.
685 :     $self->{translation} =~ s/\*$//;
686 :     }
687 : parrello 1.2 }
688 :    
689 : parrello 1.14 =head3 ConstrainCodon
690 :    
691 :     my $point = $loc->ConstrainCodon($point, $codonPoint);
692 :    
693 :     Constrain the specified point so that it is inside the bounds of this
694 :     location's contig and its distance to the specified codon point is a
695 :     multiple of 3.
696 :    
697 :     =over 4
698 :    
699 :     =item point
700 :    
701 :     Point index (relative to the contig) to be constrained.
702 :    
703 :     =item codonPoint
704 :    
705 :     Index (relative to the contig) of a point that is the start of a codon.
706 :    
707 :     =item RETURN
708 :    
709 :     Returns the constrained index.
710 :    
711 :     =back
712 :    
713 :     =cut
714 :    
715 :     sub ConstrainCodon {
716 :     # Get the parameters.
717 :     my ($self, $point, $codonPoint) = @_;
718 :     # Declare the return variable.
719 :     my $retVal = $point;
720 :     # Check for too far left.
721 :     if ($retVal < 1) {
722 :     $retVal = ($codonPoint - 1) % 3 + 1;
723 :     } else {
724 :     # Check for too far right.
725 :     my $contigLen = $self->{fig}->contig_ln($self->{genomeID}, $self->Contig);
726 :     if ($retVal > $contigLen) {
727 :     $retVal = $contigLen - ($contigLen - $codonPoint) % 3;
728 :     }
729 :     }
730 :     # Return the result.
731 :     return $retVal;
732 :     }
733 :    
734 :    
735 : parrello 1.1 =head3 GetBounds
736 :    
737 : parrello 1.13 my ($boundLoc, $loc1, $locN) = $loc->GetBounds;
738 : parrello 1.1
739 :     Analyze this location and return information about its boundaries. This includes
740 :     a location for the bounds, the first location, and the last location. The
741 :     bounds essentially define the location as it would be if it were all on a single
742 :     contig in the same direction and had no gaps. The first location is the location
743 :     object containing the begin point of the bounds, and the last location is the
744 :     location object containing the end point of the bounds.
745 :    
746 :     =cut
747 :    
748 :     sub GetBounds {
749 :     # Get the parameters.
750 :     my ($self) = @_;
751 :     # Declare the return variables.
752 :     my ($boundLoc, $loc1, $locN);
753 :     # Get a reference to the location list.
754 :     my $locList = $self->Locs;
755 :     # The most common case is a singleton location list. We handle that first.
756 :     if (@{$locList} == 1) {
757 :     my $bloc = $locList->[0];
758 :     $boundLoc = BasicLocation->new($bloc);
759 :     $loc1 = $bloc;
760 :     $locN = $bloc;
761 :     } else {
762 :     # Here we have a multiple-location list. We search for the leftmost left
763 :     # and rightmost right on the base contig. To do that, we first extract
764 :     # all the eligible locations.
765 :     my $baseContig = $self->Contig;
766 :     my @baseLocs = grep { $_->Contig eq $baseContig } @{$locList};
767 :     # Next we prime the loop with a location popped off the list.
768 :     my $loc0 = pop @baseLocs;
769 :     my ($leftLoc, $rightLoc) = ($loc0, $loc0);
770 :     # Search for the leftmost and rightmost locations.
771 :     for my $loci (@baseLocs) {
772 :     if ($loci->Left < $leftLoc->Left) {
773 :     $leftLoc = $loci;
774 :     }
775 :     if ($loci->Right > $rightLoc->Right) {
776 :     $rightLoc = $loci;
777 :     }
778 :     }
779 :     # Now we have enough information to build the bounding location. First,
780 :     # we get the length.
781 :     my $len = $rightLoc->Right - $leftLoc->Left + 1;
782 :     # Next, we arrange the left and right locations according to the direction.
783 :     if ($self->Dir eq '+') {
784 :     ($loc1, $locN) = ($leftLoc, $rightLoc);
785 :     } else {
786 :     ($loc1, $locN) = ($rightLoc, $leftLoc);
787 :     }
788 :     # Finally, we create the bounding location.
789 :     $boundLoc = BasicLocation->new($baseContig, $loc1->Begin, $self->Dir, $len);
790 :     }
791 :     # Return the results.
792 :     return ($boundLoc, $loc1, $locN);
793 :     }
794 :    
795 : parrello 1.21 =head2 Codon Search Methods
796 :    
797 :     =head3 UpstreamSearch
798 :    
799 :     my $loc = $floc->UpstreamSearch($pattern, $limit);
800 :    
801 :     Search upstream from this location for a codon as identified by the
802 :     specified pattern, stopping at the end of the contig or when the
803 :     specified limit is reached.
804 :    
805 :     =over 4
806 :    
807 :     =item pattern
808 :    
809 :     Codon pattern to search for, expressed as a bar-delimited list of base triplets.
810 :     For example, C<taa|tag|tga> would search for a stop codon.
811 :    
812 :     =item limit (optional)
813 :    
814 :     Maximum number of base pairs to search. Must be a multiple of 3.
815 :    
816 :     =item RETURN
817 :    
818 :     Returns a [[BasicLocationPm]] object for the codon found.
819 :    
820 :     =back
821 :    
822 :     =cut
823 :    
824 :     sub UpstreamSearch {
825 :     # Get the parameters.
826 :     my ($self, $pattern, $limit) = @_;
827 :     # Get the FIG object.
828 :     my $fig = $self->{fig};
829 :     # Declare the return variable.
830 :     my $retVal;
831 :     # Locate the starting and ending positions for the search. The search
832 :     # ends immediately upstream of our begin point.
833 :     my $end = $self->PrevPoint;
834 :     my $start = $end - ($self->Dir . 1) * ($limit - 1);
835 :     # Constrain these values to the inside of the contig.
836 :     $start = $self->ConstrainCodon($start, $self->Begin);
837 :     $end = $self->ConstrainCodon($end, $self->Begin + ($self->Dir . 2));
838 :     # Get the DNA to search. Note we convert it automatically to lower case.
839 :     my $dna = lc $fig->dna_seq($self->{genomeID}, $self->Contig . "_${start}_${end}");
840 :     # Insure the pattern is also lower-case.
841 :     $pattern = lc $pattern;
842 :     # Get the location of the last codon in the dna sequence.
843 :     my $i1 = length($dna) - 3;
844 :     Trace("$i1 base pairs in search.") if T(4);
845 :     Trace("Upsearch DNA translation\n" . FIG::translate($dna, $self->{code})) if T(4);
846 :     for (my $i = $i1; $i >= 0 && ! defined($retVal); $i -= 3) {
847 :     # Check for a match.
848 :     if (substr($dna, $i, 3) =~ /$pattern/) {
849 :     # Compute the actual return value. This will also stop the loop.
850 :     $retVal = BasicLocation->new($self->Contig, $i * ($self->Dir . 1) + $start, $self->Dir, 3);
851 :     }
852 :     }
853 :     # Return the result.
854 :     return $retVal;
855 :     }
856 :    
857 :    
858 :     =head3 DownstreamSearch
859 :    
860 :     my $loc = $floc->DownstreamSearch($pattern, $limit);
861 :    
862 :     Search downstream from this location for a codon as identified by the
863 :     specified pattern, stopping at the end of the contig or when the
864 :     specified limit is reached.
865 :    
866 :     =over 4
867 :    
868 :     =item pattern
869 :    
870 :     Codon pattern to search for, expressed as a bar-delimited list of base triplets.
871 :     For example, C<taa|tag|tga> would search for a stop codon.
872 :    
873 :     =item limit (optional)
874 :    
875 :     Maximum number of base pairs to search. Must be a multiple of 3.
876 :    
877 :     =item RETURN
878 :    
879 : parrello 1.22 Returns a [[BasicLocationPm]] object for the codon found.
880 : parrello 1.21
881 :     =back
882 :    
883 :     =cut
884 :    
885 :     sub DownstreamSearch {
886 :     # Get the parameters.
887 :     my ($self, $pattern, $limit) = @_;
888 :     # Get the FIG object.
889 :     my $fig = $self->{fig};
890 :     # Declare the return variable.
891 :     my $retVal;
892 :     # Locate the starting and ending positions for the search. The search starts
893 :     # immediately downstream of our end point.
894 :     my $start = $self->NextPoint;
895 :     my $end = $start + ($self->Dir . 1) * ($limit - 1);
896 :     # Do a down search to find the codon.
897 :     $retVal = $self->DownSearch($pattern, $start, $end);
898 :     # Return the result.
899 :     return $retVal;
900 :     }
901 :    
902 :    
903 :     =head3 InsideSearch
904 :    
905 :     my $loc = $floc->InsideSearch($pattern);
906 :    
907 :     Search inside this location for a codon as identified by the
908 :     specified pattern.
909 :    
910 :     =over 4
911 :    
912 :     =item pattern
913 :    
914 :     Codon pattern to search for, expressed as a bar-delimited list of base triplets.
915 :     For example, C<taa|tag|tga> would search for a stop codon.
916 :    
917 :     =item RETURN
918 :    
919 :     Returns a [[BasicLocationPm]] object for the codon found.
920 :    
921 :     =back
922 :    
923 :     =cut
924 :    
925 :     sub InsideSearch {
926 :     # Get the parameters.
927 :     my ($self, $pattern, $limit) = @_;
928 :     # Get the FIG object.
929 :     my $fig = $self->{fig};
930 :     # Declare the return variable.
931 :     my $retVal;
932 :     # Locate the starting and ending positions for the search. The search
933 :     # is entirely inside the location.
934 :     my $start = $self->Begin;
935 :     my $end = $self->EndPoint;
936 :     # Do a down search to find the codon.
937 :     $retVal = $self->DownSearch($pattern, $start, $end);
938 :     # Return the result.
939 :     return $retVal;
940 :     }
941 :    
942 :     =head3 DownSearch
943 :    
944 :     my $loc = $floc->DownSearch($pattern, $start, $end);
945 :    
946 :     Search parallel to this location for a codon as identified by the
947 :     specified pattern, starting at the specified start point and stopping
948 :     at the specified end point.
949 :    
950 :     =over 4
951 :    
952 :     =item pattern
953 :    
954 :     Codon pattern to search for, expressed as a bar-delimited list of base triplets.
955 :     For example, C<taa|tag|tga> would search for a stop codon.
956 :    
957 :     =item $start
958 :    
959 :     Starting position for search.
960 :    
961 :     =item $end
962 :    
963 :     Ending position for search.
964 :    
965 :     =item RETURN
966 :    
967 :     Returns a [[BasicLocationPm]] object for the codon found.
968 :    
969 :     =back
970 :    
971 :     =cut
972 :    
973 :     sub DownSearch {
974 :     # Get the parameters.
975 :     my ($self, $pattern, $start, $end) = @_;
976 :     # Get the FIG object.
977 :     my $fig = $self->{fig};
978 :     # Declare the return variable.
979 :     my $retVal;
980 :     # Insure we're inside the contig.
981 :     my $realStart = $self->ConstrainCodon($start, $start);
982 :     my $realEnd = $self->ConstrainCodon($end, $start + ($self->Dir . 2));
983 :     # Get the DNA to search. Note we convert it automatically to lower case.
984 :     my $dna = lc $fig->dna_seq($self->{genomeID}, $self->Contig . "_${realStart}_${realEnd}");
985 :     # Insure the pattern is also lower-case.
986 :     $pattern = lc $pattern;
987 :     # Get the length of the dna sequence.
988 :     my $i1 = length($dna);
989 :     Trace("$i1 base pairs in search.") if T(4);
990 :     Trace("Downsearch DNA translation\n" . FIG::translate($dna, $self->{code})) if T(4);
991 :     for (my $i = 0; $i < $i1 && ! defined($retVal); $i += 3) {
992 :     # Check for a match.
993 :     if (substr($dna, $i, 3) =~ /$pattern/) {
994 :     # Compute the actual return value. This will also stop the loop.
995 :     $retVal = BasicLocation->new($self->Contig, $i * ($self->Dir . 1) + $realStart, $self->Dir, 3);
996 :     }
997 :     }
998 :     # Return the result.
999 :     return $retVal;
1000 :     }
1001 :    
1002 : parrello 1.14 =head3 PickGeneBoundaries
1003 :    
1004 : parrello 1.19 my $rc = $floc->PickGeneBoundaries(-stop => $stopPattern,
1005 :     -start => $startPattern, -limit => $limit);
1006 : parrello 1.14
1007 :     Update this location so that it has a valid start and stop. The basic
1008 :     algorithm used is to search upstream and downstream for stop codons, then
1009 :     search between the stop codons for the first start codon.
1010 :    
1011 :     =over 4
1012 :    
1013 : parrello 1.19 =item stop (optional)
1014 : parrello 1.14
1015 :     Search pattern for the stop codon, encoded as a bar-delimited list of DNA triplets
1016 :     (e.g. C<tta|ata|tag>). If omitted, the default is to look for stop codons in the
1017 :     attached genetic code table.
1018 :    
1019 : parrello 1.19 =item start (optional)
1020 : parrello 1.14
1021 :     Search pattern for the start codon, encoded as a bar-delimited list of DNA triplets.
1022 :     If omitted, the default is C<atg|gtg|ttg>.
1023 :    
1024 :     =item limit (optional)
1025 :    
1026 : parrello 1.18 If a number, then the maximum distance to search when attempting to extend the
1027 :     location. If a [[BasicLocationPm]] object or a location string, then none of the searches will
1028 :     go outside the region spanned by the location. If omitted, the default is a scalar value of C<9000>.
1029 : parrello 1.14
1030 :     =item RETURN
1031 :    
1032 : parrello 1.20 Returns TRUE if successful, FALSE if the process fails.
1033 : parrello 1.14
1034 :     =back
1035 :    
1036 :     =cut
1037 :    
1038 :     sub PickGeneBoundaries {
1039 :     # Get the parameters.
1040 : parrello 1.19 my ($self, %parms) = @_;
1041 : parrello 1.14 # Declare the return variable.
1042 :     my $retVal;
1043 : parrello 1.19 # Get the stop pattern.
1044 :     my $stopPattern = $parms{-stop} || lc join("|", grep { $self->{code}->{$_} eq '*' } keys %{$self->{code}});
1045 :     # Get the start pattern.
1046 :     my $startPattern = $parms{-start} || 'atg|gtg|ttg';
1047 :     # Compute the limits. We have an upstream limit and a downstream limit.
1048 :     my $limit = $parms{-limit};
1049 : parrello 1.18 my ($upLimit, $downLimit);
1050 :     if (! defined $limit) {
1051 :     $upLimit = 9000;
1052 :     $downLimit = 9000;
1053 :     } elsif ($limit =~ /^\d+$/) {
1054 :     $upLimit = $limit;
1055 :     $downLimit = $limit;
1056 :     } else {
1057 :     my $limitLoc = (! ref $limit ? new BasicLocation($limit) : $limit);
1058 :     $upLimit = abs($limitLoc->Begin - $self->Begin);
1059 :     $downLimit = abs($limitLoc->EndPoint - $self->EndPoint);
1060 :     }
1061 : parrello 1.21 # Insure the limits are on codon boundaries.
1062 :     $upLimit -= $upLimit % 3;
1063 :     $downLimit -= $downLimit % 3;
1064 : parrello 1.20 # Save the current boundaries.
1065 :     my $oldBegin = $self->Begin;
1066 :     my $oldEnd = $self->EndPoint;
1067 : parrello 1.21 # Get the contig boundaries.
1068 :     my $contigBegin = $self->ExtremeCodon('first');
1069 :     my $contigEnd = $self->ExtremeCodon('last');
1070 :     # Get the distance to each extreme.
1071 :     my $upExtreme = abs($contigBegin - $oldBegin);
1072 :     my $downExtreme = abs($contigEnd - $oldEnd);
1073 :     Trace("Up: limit = $upLimit, extreme = $upExtreme. Down: limit = $downLimit, extreme = $downExtreme.") if T(4);
1074 : parrello 1.14 # Search upstream for a stop.
1075 : parrello 1.18 my $upLoc = $self->UpstreamSearch($stopPattern, $upLimit);
1076 : parrello 1.15 # Check to see if we found one.
1077 :     my $newBegin;
1078 : parrello 1.21 if (defined $upLoc) {
1079 :     # Here we found a stop, so the new beginning is after the codon found.
1080 :     my $newOrfBegin = $upLoc->PointOffset(3);
1081 :     $self->Extend($newOrfBegin);
1082 : parrello 1.18 # Now look for a start between the stop codon found and the end of the location.
1083 :     my $startLoc = $self->InsideSearch($startPattern);
1084 : parrello 1.21 if (! defined $startLoc) {
1085 : parrello 1.19 Trace("New start not found for " . $self->SeedString() . ".") if T(3);
1086 :     } else {
1087 : parrello 1.21 # We found the start, so it becomes our new beginning.
1088 :     $newBegin = $startLoc->Begin;
1089 :     }
1090 :     } elsif ($upExtreme <= $upLimit) {
1091 :     # Here we fell off the end of the contig while searching, so this is ok.
1092 :     Trace("Contig beginning used for stop codon after upstream search.") if T(3);
1093 :     $newBegin = $contigBegin;
1094 :     } else {
1095 :     Trace("Upstream stop not found for " . $self->SeedString() . ".") if T(3);
1096 :     }
1097 :     # Only proceed if we have a new beginning.
1098 :     if (defined $newBegin) {
1099 :     # Search downstream for a stop codon.
1100 :     my $stopLoc = $self->DownstreamSearch($stopPattern, $downLimit);
1101 :     # Check the result.
1102 :     my $endPoint;
1103 :     if (defined $stopLoc) {
1104 :     # Here we found the end point.
1105 :     $endPoint = $stopLoc->EndPoint;
1106 :     } elsif ($downLimit >= $downExtreme) {
1107 :     # Here we fell off the end, so we use the end.
1108 :     $endPoint = $contigEnd;
1109 :     Trace("Contig end used for stop codon after downstream search.") if T(3);
1110 :     } else {
1111 :     Trace("Downsream stop not found for " . $self->SeedString() . ".") if T(3);
1112 :     }
1113 :     if (defined $endPoint) {
1114 :     # Extend the location to the start and stop found. This will almost certainly
1115 :     # require trimming in the start-codon direction.
1116 :     $self->Extend($newBegin, $endPoint, 'trim');
1117 :     # Denote we've succeeded.
1118 :     $retVal = 1;
1119 : parrello 1.18 }
1120 : parrello 1.14 }
1121 : parrello 1.20 # If we failed, restore the location.
1122 :     if (! $retVal) {
1123 :     $self->Extend($oldBegin, $oldEnd, 'trim');
1124 :     }
1125 : parrello 1.14 # Return the success indicator.
1126 :     return $retVal;
1127 :     }
1128 :    
1129 : parrello 1.9
1130 : parrello 1.1 =head2 Internal Methods
1131 :    
1132 :     =head3 ParseLocations
1133 :    
1134 :     Parse an array into a list of basic location objects. The array can contain
1135 :     basic location objects or location strings. The first parameter must be the
1136 :     full location object being constructed.
1137 :    
1138 :     This is a static method.
1139 :    
1140 :     =cut
1141 :    
1142 :     sub _ParseLocations {
1143 :     # Get the location list.
1144 :     my ($parent, @locs) = @_;
1145 :     # Create the return array.
1146 :     my @retVal = ();
1147 :     # Create a location index counter.
1148 :     my $idx = 0;
1149 :     # Loop through the locations.
1150 :     for my $loc (@locs) {
1151 :     # Create a variable to hold the location object created.
1152 :     my $locObject;
1153 :     # Check to see if this is a string or a location object.
1154 :     if (ref $loc eq '') {
1155 :     # It's a string, so parse it into a location object.
1156 :     $locObject = BasicLocation->new($loc, $parent, $idx);
1157 :     } elsif (UNIVERSAL::isa($loc, "BasicLocation")) {
1158 :     # It's a location object, so copy it and set the parent and
1159 :     # index.
1160 :     $locObject = BasicLocation->new($loc);
1161 :     $locObject->Attach($parent, $idx);
1162 :     } else {
1163 :     # Here we have an error.
1164 :     my $type = ref $loc;
1165 :     Confess("Invalid location object of type $type found at index $idx.");
1166 :     }
1167 :     # Add the location to the list.
1168 :     push @retVal, $locObject;
1169 :     $idx++;
1170 :     }
1171 :     # Return a reference to the location list.
1172 :     return \@retVal;
1173 :     }
1174 :    
1175 :     =head3 FindPattern
1176 :    
1177 :     Locate the index of a specified pattern in a DNA string.
1178 :    
1179 :     This is a static method.
1180 :    
1181 :     =over 4
1182 :    
1183 :     =item dna
1184 :    
1185 :     DNA string to search.
1186 :    
1187 :     =item pattern
1188 :    
1189 : parrello 1.2 Pattern for which to search (see L</Search>).
1190 : parrello 1.1
1191 :     =item RETURN
1192 :    
1193 :     Returns the index of the specified pattern, or C<undef> if the pattern is not found.
1194 :    
1195 :     =back
1196 :    
1197 :     =cut
1198 :     #: Return Type $;
1199 :     sub _FindPattern {
1200 :     # Get the parameters.
1201 : parrello 1.2 my ($dna, $pattern) = @_;
1202 : parrello 1.1 # Declare the return variable.
1203 :     my $retVal;
1204 :     # Insure the pattern is lower case.
1205 :     my $realPattern = lc $pattern;
1206 : parrello 1.9 # Start at the beginning of the string. We will chop stuff off the string
1207 :     # as we search through it. This smount chopped must then be added to the offset
1208 :     # of the found string in order to get the return value.
1209 :     my $pos = 0;
1210 :     # Do the search. We search for stop codons, and stop at the first one
1211 :     # that lies on a codon boundary.
1212 :     while (!defined($retVal) && $dna =~ m/$realPattern/g) {
1213 :     # We have a match. Get its location. Note that the "pos" function returns
1214 :     # the point where the search left off, so we have to back off a bit to
1215 :     # get the starting point of the codon.
1216 :     my $newPos = (pos $dna) - 3;
1217 :     # See if this is on a codon boundary.
1218 :     my $mod = $newPos % 3;
1219 :     if ($mod == 0) {
1220 :     # It is, so return the offset from the original start of the string.
1221 :     $retVal = $pos + $newPos;
1222 :     } else {
1223 :     # Here we need to keep searching. First, however, we move the search
1224 :     # position forward to the next codon boundary. This avoids useless checking
1225 :     # of the next byte or two and insures that we don't find the same value
1226 :     # again.
1227 :     $newPos += 3 - $mod;
1228 :     $pos += $newPos;
1229 :     $dna = substr($dna, $newPos);
1230 :     }
1231 :     }
1232 :     # Return the result.
1233 :     return $retVal;
1234 :     }
1235 :    
1236 : parrello 1.1
1237 :     1;
1238 :    

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3