[Bio] / FigKernelPackages / BasicLocation.pm Repository:
ViewVC logotype

Annotation of /FigKernelPackages/BasicLocation.pm

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.6 - (view) (download) (as text)

1 : olson 1.5 #
2 :     # Copyright (c) 2003-2006 University of Chicago and Fellowship
3 :     # for Interpretations of Genomes. All Rights Reserved.
4 :     #
5 :     # This file is part of the SEED Toolkit.
6 :     #
7 :     # The SEED Toolkit is free software. You can redistribute
8 :     # it and/or modify it under the terms of the SEED Toolkit
9 :     # Public License.
10 :     #
11 :     # You should have received a copy of the SEED Toolkit Public License
12 :     # along with this program; if not write to the University of Chicago
13 :     # at info@ci.uchicago.edu or the Fellowship for Interpretation of
14 :     # Genomes at veronika@thefig.info or download a copy from
15 :     # http://www.theseed.org/LICENSE.TXT.
16 :     #
17 :    
18 : parrello 1.1 package BasicLocation;
19 :    
20 :     use strict;
21 :     use Tracer;
22 :    
23 :     =head1 Basic Location Object
24 :    
25 :     =head2 Introduction
26 :    
27 :     A I<basic location> defines a region of a contig. Because of the way genetic features
28 :     work, a basic location has a direction associated with it. The traditional method for encoding
29 :     a basic location is to specify the contig ID, the starting offset, and the ending offset. If the
30 :     start is before the end, we have a forward basic location, and if the start is after the end, we
31 :     have a backward basic location.
32 :    
33 :     =over 4
34 :    
35 :     =item C<RED_1_400>
36 :    
37 :     is the first 400 nucleotides of the RED contig, processed forward.
38 :    
39 :     =item C<NC_000913_500_401>
40 :    
41 :     is a 100-nucleotide region in NC_000913 that is processed backward from the 500th nucleotide.
42 :    
43 :     =back
44 :    
45 :     Note that even though they are called "offsets", location indices are 1-based.
46 :     Note also that the possibility of an underscore in the contig ID makes the parsing a little
47 :     tricky.
48 :    
49 :     The Sprout uses a slightly different format designed to allow for the possibility of
50 :     zero-length regions. Instead of a starting and ending position, we specify the start position,
51 :     the direction (C<+> or C<->), and the length. The Sprout versions of the two example locations
52 :     above are
53 :    
54 :     =over 4
55 :    
56 :     =item C<RED_1+400>
57 :    
58 :     corresponds to C<RED_1_400>
59 :    
60 :     =item C<NC_000913_500-100>
61 :    
62 :     corresponds to C<NC_000913_500_401>
63 :    
64 :     =back
65 :    
66 :     Working with the raw location string is difficult because it can have one of two formats
67 :     and it is constantly necessary to ask if the location is forward or backward. The basic location
68 :     object seeks to resolve these differences by providing a single interface that can be
69 :     used regardless of the format or direction.
70 :    
71 :     It is frequently useful to keep additional data about a basic location while it is being passed
72 :     around. The basic location object is a PERL hash, and this additional data is kept in the object
73 :     by adding hash keys. The internal values used by the object have keys preceded by an
74 :     underscore, so any keys not beginning with underscores are considered to be additional
75 :     values. The additional values are called I<augments>.
76 :    
77 :     When a basic location is in its string form, the augments can be tacked on using parentheses
78 :     enclosing a comma-delimited list of assignments. For example, say we want to describe
79 :     the first 400 base pairs in the contig B<RED>, and include the fact that it is the second
80 :     segment of feature B<fig|12345.1.peg.2>. We could use the key C<fid> for the feature ID and
81 :     C<idx> for the segment index (0-based), in which case the location string would be
82 :    
83 :     RED_1+400(fid=fig|12345.1.peg.2,idx=1)
84 :    
85 :     When this location string is converted to a location object in the variable C<$loc>, we
86 :     would have
87 :    
88 :     $loc->{fig} eq 'fig|12345.1.peg.2'
89 :     $loc->{idx} == 1
90 :    
91 :     Spaces can be added for readability. The above augmented location string can also be
92 :     coded as
93 :    
94 :     RED_1+400(fid = fig|12345.1.peg.2, idx = 1)
95 :    
96 :     A basic location is frequently part of a full location. Full locations are described by the
97 :     B<FullLocation> object. A full location is a list of basic locations associated with a genome
98 :     and a FIG-like object. If the parent full location is known, we can access the basic location's
99 :     raw DNA. To construct a basic location that is part of a full location, we add the parent full
100 :     location and the basic location's index to the constructor. In the constructor below,
101 :     C<$parent> points to the parent full location.
102 :    
103 :     my $secondLocation = BasicLocation->new("RED_450+100", $parent, 1);
104 :    
105 :     =cut
106 :    
107 :     #: Constructor BasicLocation->new('NC_000913_499_400');
108 :    
109 :     =head2 Public Methods
110 :    
111 :     =head3 new
112 :    
113 :     C<< my $loc = BasicLocation->new($locString, $parentLocation, $idx); >>
114 :    
115 :     Construct a basic location from a location string. A location string has the form
116 :     I<contigID>C<_>I<begin>I<dir>I<len> where I<begin> is the starting position,
117 :     I<dir> is C<+> for a forward transcription or C<-> for a backward transcription,
118 :     and I<len> is the length. So, for example, C<1999.1_NC123_4000+200> describes a
119 :     location beginning at position 4000 of contig C<1999.1_NC123> and ending at
120 :     position 4199. Similarly, C<1999.1_NC123_2000-400> describes a location in the
121 :     same contig starting at position 2000 and ending at position 1601.
122 :    
123 :     Augments can be specified as part of the location string using parentheses and
124 :     a comma-delimited list of assignments. For example, the following constructor
125 :     creates a location augmented by a feature ID called C<fid> and an index value
126 :     called C<idx>.
127 :    
128 :     my $loc = BasicLocation->new("NC_000913_499_400(fid = fig|83333.1.peg.10, idx => 2)");
129 :    
130 :     All fields internal to the location object have names beginning with an
131 :     underscore (C<_>), so as long as the value name begins with a letter,
132 :     there should be no conflict.
133 :    
134 :     =over 4
135 :    
136 :     =item locString
137 :    
138 :     Location string, as described above.
139 :    
140 :     =item parentLocation (optional)
141 :    
142 :     Full location that this basic location is part of (if any).
143 :    
144 :     =item idx (optional)
145 :    
146 :     Index of this basic location in the parent full location.
147 :    
148 :     =back
149 :    
150 :     C<< my $loc = BasicLocation->new($location, $contigID); >>
151 :    
152 :     Construct a location by copying another location and plugging in a new contig ID.
153 :    
154 :     =over 4
155 :    
156 :     =item location
157 :    
158 :     Location whose data is to be copied.
159 :    
160 :     =item contigID (optional)
161 :    
162 :     ID of the new contig to be plugged in.
163 :    
164 :     =back
165 :    
166 :     C<< my $loc = BasicLocation->new($contigID, $beg, $dir, $len, $augments, $parentLocation, $idx); >>
167 :    
168 :     Construct a location from specific data elements, in particular the contig ID, the starting
169 :     offset, the direction, and the length.
170 :    
171 :     =over 4
172 :    
173 :     =item contigID
174 :    
175 :     ID of the contig on which the location occurs.
176 :    
177 :     =item beg
178 :    
179 :     Starting offset of the location.
180 :    
181 :     =item dir
182 :    
183 :     Direction of the location: C<+> for a forward location and C<-> for a backward location. If
184 :     C<_> is specified instead, it will be presumed that the fourth argument is an endpoint and not
185 :     a length.
186 :    
187 :     =item len
188 :    
189 :     Length of the location. If the direction is an underscore (C<_>), it will be the endpoint
190 :     instead of the length.
191 :    
192 :     =item augments (optional)
193 :    
194 :     Reference to a hash containing any augment values for the location.
195 :    
196 :     =item parentLocation (optional)
197 :    
198 :     Full location that this basic location is part of (if any).
199 :    
200 :     =item idx (optional)
201 :    
202 :     Index of this basic location in the parent full location.
203 :    
204 :     =back
205 :    
206 :     =cut
207 :    
208 :     sub new {
209 :     # Get the parameters.
210 :     my ($class, @p) = @_;
211 :     # Declare the data variables.
212 :     my ($contigID, $beg, $dir, $len, $end, $parent, $idx, $augments, $augmentString);
213 :     # Determine the signature type.
214 :     if (@p >= 4) {
215 :     # Here we have specific incoming data.
216 :     ($contigID, $beg, $dir, $len, $augments, $parent, $idx) = @p;
217 :     } elsif (UNIVERSAL::isa($p[0],__PACKAGE__)) {
218 :     # Here we have a source location and possibly a new contig ID.
219 :     $contigID = (defined $p[1] ? $p[1] : $p[0]->{_contigID});
220 :     ($beg, $dir, $len) = ($p[0]->{_beg}, $p[0]->{_dir}, $p[0]->{_len});
221 :     if (exists $p[0]->{_parent}) {
222 :     ($parent, $idx) = ($p[0]->{_parent}, $p[0]->{_idx});
223 :     }
224 :     # Get the augments (if any) from the source location. We want these
225 :     # copied to the new location.
226 :     $augments = { };
227 :     for my $key (keys %{$p[0]}) {
228 :     if (substr($key, 0, 1) ne '_') {
229 :     $augments->{$key} = $p[0]->{$key};
230 :     }
231 :     }
232 :     } else {
233 :     # Here we have a source string and possibly augments. We first parse
234 :     # the source string.
235 :     $p[0] =~ /^(.+)_(\d+)(\+|\-|_)(\d+)($|\(.*\)$)/;
236 :     ($contigID, $beg, $dir, $len, $augmentString) = ($1, $2, $3, $4, $5);
237 :     # Check for augments.
238 :     if ($augmentString) {
239 :     # Here we have an augment string. First, we strip the enclosing
240 :     # parentheses.
241 :     $augmentString = substr $augmentString, 1, length($augmentString) - 2;
242 :     # Now we parse out the assignments and put them in a hash.
243 :     my %augmentHash = map { split /\s*,\s*/, $_ } split /\s*=\s*/, $augmentString;
244 :     $augments = \%augmentHash;
245 :     }
246 :     # Pull in the parent location and index, if applicable.
247 :     ($parent, $idx) = ($p[1], $p[2]);
248 :     }
249 :     # Determine the format.
250 :     if ($dir eq '_') {
251 :     # Here we have the old format. The endpoint was parsed as the length.
252 :     $end = $len;
253 :     # Compare the start and end to get the direction and compute the true length.
254 :     if ($beg > $end) {
255 :     ($dir, $len) = ('-', $beg - $end + 1);
256 :     } else {
257 :     ($dir, $len) = ('+', $end - $beg + 1);
258 :     }
259 :     } else {
260 :     # Here we have the new format. We compute the endpoint
261 :     # from the direction.
262 :     $end = (($dir eq '+') ? $beg + $len - 1 : $beg - $len + 1);
263 :     }
264 :     # Create the return structure.
265 :     my $retVal = { _contigID => $contigID, _beg => $beg, _dir => $dir,
266 :     _end => $end, _len => $len, _parent => $parent,
267 :     _idx => $idx };
268 :     # Add the augments.
269 :     if ($augments) {
270 :     for my $key (keys %{$augments}) {
271 :     $retVal->{$key} = $augments->{$key};
272 :     }
273 :     }
274 :     # Bless the location with the appropriate package name.
275 :     if ($dir eq '+') {
276 :     bless $retVal, "FBasicLocation";
277 :     } else {
278 :     bless $retVal, "BBasicLocation";
279 :     }
280 :     # Return the blessed object.
281 :     return $retVal;
282 :     }
283 :    
284 :     =head3 Contig
285 :    
286 :     C<< my $contigID = $loc->Contig; >>
287 :    
288 :     Return the location's contig ID.
289 :    
290 :     =cut
291 :     #: Return Type $;
292 :     sub Contig {
293 :     return $_[0]->{_contigID};
294 :     }
295 :    
296 :     =head3 Begin
297 :    
298 :     C<< my $beg = $loc->Begin; >>
299 :    
300 :     Return the location's starting offset.
301 :    
302 :     =cut
303 :     #: Return Type $;
304 :     sub Begin {
305 :     return $_[0]->{_beg};
306 :     }
307 :    
308 :     =head3 Dir
309 :    
310 :     C<< my $dirChar = $loc->Dir; >>
311 :    
312 :     Return the location's direction: C<+> for a forward location and C<-> for a backward one.
313 :    
314 :     =cut
315 :     #: Return Type $;
316 :     sub Dir {
317 :     return $_[0]->{_dir};
318 :     }
319 :    
320 :     =head3 Length
321 :    
322 :     C<< my $len = $loc->Length; >>
323 :    
324 :     Return the location's length (in nucleotides).
325 :    
326 :     =cut
327 :     #: Return Type $;
328 :     sub Length {
329 :     return $_[0]->{_len};
330 :     }
331 :    
332 :     =head3 EndPoint
333 :    
334 :     C<< my $offset = $loc->EndPoint; >>
335 :    
336 :     Return the location's ending offset.
337 :    
338 :     =cut
339 :     #: Return Type $;
340 :     sub EndPoint {
341 :     return $_[0]->{_end};
342 :     }
343 :    
344 :     =head3 Parent
345 :    
346 :     C<< my $parentLocation = $loc->Parent; >>
347 :    
348 :     Return the full location containing this basic location (if any).
349 :    
350 :     =cut
351 :     #: Return Type %;
352 :     sub Parent {
353 :     return $_[0]->{_parent};
354 :     }
355 :    
356 :     =head3 Index
357 :    
358 :     C<< my $idx = $loc->Index; >>
359 :    
360 :     Return the index of this basic location inside the parent location (if any).
361 :    
362 :     =cut
363 :     #: Return Type $;
364 :     sub Index {
365 :     return $_[0]->{_idx};
366 :     }
367 :    
368 :     =head3 String
369 :    
370 :     C<< my $string = $loc->String; >>
371 :    
372 :     Return a Sprout-format string representation of this location.
373 :    
374 :     =cut
375 :     #: Return Type $;
376 :     sub String {
377 :     my ($self) = @_;
378 :     return $self->{_contigID} . "_" . $self->{_beg} . $self->{_dir} . $self->{_len};
379 :     }
380 :    
381 :     =head3 SeedString
382 :    
383 :     C<< my $string = $loc->SeedString; >>
384 :    
385 :     Return a SEED-format string representation of this location.
386 :    
387 :     =cut
388 :     #: Return Type $;
389 :     sub SeedString {
390 :     my ($self) = @_;
391 :     return $self->{_contigID} . "_" . $self->{_beg} . "_" . $self->{_end};
392 :     }
393 :    
394 :     =head3 AugmentString
395 :    
396 :     C<< my $string = $loc->AugmentString; >>
397 :    
398 :     Return a Sprout-format string representation of this location with augment data
399 :     included. The augment data will be appended as a comma-delimited list of assignments
400 :     enclosed in parentheses, the exact format expected by the single-argument location object
401 :     constructor L</new>.
402 :    
403 :     =cut
404 :     #: Return Type $;
405 :     sub AugmentString {
406 :     # Get this instance.
407 :     my ($self) = @_;
408 :     # Get the pure location string.
409 :     my $retVal = $self->String;
410 :     # Create the augment string. We build it from all the key-value pairs in the hash
411 :     # for which the key does not being with an underscore.
412 :     my @augmentStrings = ();
413 :     for my $key (sort keys %{$self}) {
414 :     if (substr($key,0,1) ne "_") {
415 :     push @augmentStrings, "$key = " . $self->{$key};
416 :     }
417 :     }
418 :     # If any augments were found, we concatenate them to the result string.
419 :     if (@augmentStrings > 0) {
420 :     $retVal .= "(" . join(", ", @augmentStrings) . ")";
421 :     }
422 :     # Return the result.
423 :     return $retVal;
424 :     }
425 :    
426 :     =head3 IfValid
427 :    
428 :     C<< my $distance = IfValid($distance); >>
429 :    
430 :     Return a distance if it is a valid offset inside this location, and an undefined value otherwise.
431 :    
432 :     =over 4
433 :    
434 :     =item distance
435 :    
436 :     Relevant distance inside this location.
437 :    
438 :     =item RETURN
439 :    
440 :     Returns the incoming distance if it is non-negative and less than the location length, and an
441 :     undefined value otherwise.
442 :    
443 :     =back
444 :    
445 :     =cut
446 :     #: Return Type $;
447 :     sub IfValid {
448 :     # Get the parameters.
449 :     my ($self, $distance) = @_;
450 :     # Return the appropriate result.
451 :     return (($distance >= 0 && $distance < $self->{_len}) ? $distance : undef);
452 :     }
453 :    
454 :     =head3 Cmp
455 :    
456 :     C<< my $compare = BasicLocation::Cmp($a, $b); >>
457 :    
458 :     Compare two locations.
459 :    
460 :     The ordering principle for locations is that they are sorted first by contig ID, then by
461 :     leftmost position, in reverse order by length, and then by direction. The effect is that
462 :     within a contig, the locations are ordered first and foremost in the way they would
463 :     appear when displayed in a picture of the contig and second in such a way that embedded
464 :     locations come after the locations in which they are embedded. In the case of two
465 : parrello 1.3 locations that represent the exact same base pairs, the forward (C<+>) location is
466 :     arbitrarily placed first.
467 : parrello 1.1
468 :     =over 4
469 :    
470 :     =item a
471 :    
472 :     First location to compare.
473 :    
474 :     =item b
475 :    
476 :     Second location to compare.
477 :    
478 :     =item RETURN
479 :    
480 :     Returns a negative number if the B<a> location sorts first, a positive number if the
481 :     B<b> location sorts first, and zero if the two locations are the same.
482 :    
483 :     =back
484 :    
485 :     =cut
486 :     #: Return Type $;
487 :     sub Cmp {
488 :     # Get the parameters.
489 :     my ($a, $b) = @_;
490 :     # Compare the locations.
491 :     my $retVal = ($a->Contig cmp $b->Contig);
492 :     if ($retVal == 0) {
493 :     $retVal = ($a->Left <=> $b->Left);
494 :     if ($retVal == 0) {
495 :     $retVal = ($b->Length <=> $a->Length);
496 :     if ($retVal == 0) {
497 :     $retVal = ($a->Begin <=> $b->Begin);
498 :     }
499 :     }
500 :     }
501 :     # Return the result.
502 :     return $retVal;
503 :     }
504 :    
505 :     =head3 Matches
506 :    
507 :     C<< my $flag = BasicLocation::Matches($locA, $locB); >>
508 :    
509 :     Return TRUE if the two locations contain the same data, else FALSE. Augment data is included
510 :     in the comparison.
511 :    
512 :     =over 4
513 :    
514 :     =item locA, locB
515 :    
516 :     Locations to compare.
517 :    
518 :     =item RETURN
519 :    
520 :     Returns TRUE if the two locations contain the same data, else FALSE.
521 :    
522 :     =back
523 :    
524 :     =cut
525 :     #: Return Type $;
526 :     sub Matches {
527 :     # Get the parameters.
528 :     my ($locA, $locB) = @_;
529 :     # Declare the return variable.
530 :     my $retVal = 0;
531 :     # Verify that the major data items are the same.
532 :     if ($locA->Contig eq $locB->Contig && $locA->Begin eq $locB->Begin &&
533 :     $locA->Dir eq $locB->Dir && $locA->Length == $locB->Length) {
534 :     # Here the locations are the same, so we need to check augment data.
535 :     # First, we loop through all the augment keys in the A location.
536 :     my @aKeys = grep { /^[^_]/ } keys %{$locA};
537 :     # Assume we have a match until we find a mis-match.
538 :     $retVal = 1;
539 :     for (my $i = 0; $i <= $#aKeys && $retVal; $i++) {
540 :     my $aKey = $aKeys[$i];
541 :     $retVal = ((exists $locB->{$aKey}) && ($locA->{$aKey} eq $locB->{$aKey}));
542 :     }
543 :     # If we're still matching, verify that B doesn't have any
544 :     # keys not in A.
545 :     my @bKeys = keys %{$locB};
546 :     for (my $i = 0; $i <= $#bKeys && $retVal; $i++) {
547 :     $retVal = exists $locA->{$bKeys[$i]};
548 :     }
549 :     }
550 :     # Return the result.
551 :     return $retVal;
552 :     }
553 :    
554 :     =head3 Attach
555 :    
556 :     C<< my = $loc->Attach($parent, $idx); >>
557 :    
558 :     Point this basic location to a parent full location. The basic location will B<not> be
559 :     inserted into the full location's data structures.
560 :    
561 :     =over 4
562 :    
563 :     =item parent
564 :    
565 :     Parent full location to which this location should be attached.
566 :    
567 :     =item idx
568 :    
569 :     Index of this location in the full location.
570 :    
571 :     =back
572 :    
573 :     =cut
574 :     #: Return Type ;
575 :     sub Attach {
576 :     # Get the parameters.
577 :     my ($self, $parent, $idx) = @_;
578 :     # Save the parent location and index in our data structures.
579 :     $self->{_idx} = $idx;
580 :     $self->{_parent} = $parent;
581 :     }
582 :    
583 :     =head2 Virtual Methods
584 :    
585 :     These methods are implemented by the subclasses. They are included here for documentation
586 :     purposes.
587 :    
588 :     =head3 Left
589 :    
590 :     C<< my $leftPoint = $loc->Left; >>
591 :    
592 :     Return the offset of the leftmost point of the location.
593 :    
594 :     =cut
595 :     #: Return Type $;
596 :    
597 :     =head3 Right
598 :    
599 :     C<< my $rightPoint = $loc->Right; >>
600 :    
601 :     Return the offset of the rightmost point of the location.
602 :    
603 :     =cut
604 :     #: Return Type $;
605 :    
606 :     =head3 Compare
607 :    
608 :     C<< my ($distance, $cmp) = $loc->Compare($point); >>
609 :    
610 :     Determine the relative location of the specified point on the contig. Returns a distance,
611 :     which indicates the location relative to the leftmost point of the contig, and a comparison
612 :     number, which is negative if the point is to the left of the location, zero if the point is
613 :     inside the location, and positive if the point is to the right of the location.
614 :    
615 :     =cut
616 :     #: Return Type @;
617 :    
618 :     =head3 Split
619 :    
620 :     C<< my $newLocation = $loc->Split($offset); >>
621 :    
622 :     Split this location into two smaller ones at the specified offset from the left endpoint. The
623 :     new location split off of it will be returned.
624 :    
625 :     =over 4
626 :    
627 :     =item offset
628 :    
629 :     Offset into the location from the left endpoint of the point at which it should be split.
630 :    
631 :     =item RETURN
632 :    
633 :     The new location split off of this one.
634 :    
635 :     =back
636 :    
637 :     =cut
638 :     #: Return Type $%;
639 :    
640 :     =head3 Reverse
641 :    
642 :     C<< $loc->Reverse; >>
643 :    
644 :     Change the polarity of the location. The location will have the same nucleotide range, but
645 :     the direction will be changed.
646 :    
647 :     =cut
648 :     #: Return Type ;
649 :    
650 :     =head3 Index
651 :    
652 :     C<< my $index = $loc->Index($point); >>
653 :    
654 :     Return the index of the specified point in this location. The value returned is the distance
655 :     from the beginning. If the specified point is not in the location, an undefined value is returned.
656 :    
657 :     =over 4
658 :    
659 :     =item point
660 :    
661 :     Offset into the contig of the point in question.
662 :    
663 :     =item RETURN
664 :    
665 :     Returns the distance of the point from the beginning of the location, or an undefined value if the
666 :     point is outside the location.
667 :    
668 :     =back
669 :    
670 :     =cut
671 :     #: Return Type $;
672 :    
673 :     =head3 PointOffset
674 :    
675 :     C<< my $offset = $loc->PointOffset($index); >>
676 :    
677 :     Return the offset into the contig of the point at the specified position in the location. A position
678 :     of 0 will return the beginning point, a position of 1 returns the point next to that, and a position
679 :     1 less than the length will return the ending point.
680 :    
681 :     =over 4
682 :    
683 :     =item index
684 :    
685 :     Index into the location of the relevant point.
686 :    
687 :     =item RETURN
688 :    
689 :     Returns an offset into the contig of the specified point in the location.
690 :    
691 :     =back
692 :    
693 :     =cut
694 :     #: Return Type $;
695 :    
696 :     =head3 Peel
697 :    
698 :     C<< my $peel = $loc->Peel($length); >>
699 :    
700 :     Peel a specified number of positions off the beginning of the location. Peeling splits
701 :     a location at a specified offset from the beginning, while splitting takes it at a
702 :     specified offset from the left point. If the specified length is equal to or longer
703 :     than the location's length, an undefined value will be returned.
704 :    
705 :     =over 4
706 :    
707 :     =item length
708 :    
709 :     Number of positions to split from the location.
710 :    
711 :     =item RETURN
712 :    
713 :     Returns a new location formed by splitting positions off of the existing location, which is
714 :     shortened accordingly. If the specified length is longer than the location's length, an
715 :     undefined value is returned and the location is not modified.
716 :    
717 :     =back
718 :    
719 :     =cut
720 :     #: Return Type $%;
721 :    
722 : parrello 1.2
723 :     =head3 SetBegin
724 :    
725 :     C<< $loc->SetBegin($newBegin); >>
726 :    
727 :     Change the begin point of this location without changing the endpoint.
728 :    
729 :     =over 4
730 :    
731 :     =item newBegin
732 :    
733 :     Proposed new beginning point.
734 :    
735 :     =back
736 :    
737 :     =cut
738 :     #: Return Type ;
739 :    
740 :     =head3 SetEnd
741 :    
742 :     C<< $loc->SetEnd($newEnd); >>
743 :    
744 :     Change the endpoint of this location without changing the begin point.
745 :    
746 :     =over 4
747 :    
748 :     =item newEnd
749 :    
750 :     Proposed new ending point.
751 :    
752 :     =back
753 :    
754 :     =cut
755 :     #: Return Type ;
756 :    
757 : parrello 1.6 =head3 Widen
758 :    
759 :     C<< my = $loc->Widen($distance, $max); >>
760 :    
761 :     Add the specified distance to each end of the location, taking care not to
762 :     extend past either end of the contig. The contig length must be provided
763 :     to insure we don't fall off the far end; otherwise, only the leftward
764 :     expansion is limited.
765 :    
766 :     =over 4
767 :    
768 :     =item distance
769 :    
770 :     Number of positions to add to both ends of the location.
771 :    
772 :     =item max (optional)
773 :    
774 :     Maximum possible value for the right end of the location.
775 :    
776 :     =back
777 :    
778 :     =cut
779 :     #: Return Type ;
780 :    
781 : parrello 1.1 1;
782 :    
783 :    
784 :     package FBasicLocation;
785 :    
786 :     @FBasicLocation::ISA = qw(BasicLocation);
787 :    
788 :     =head1 Forward Basic Location Object
789 :    
790 :     =head2 Introduction
791 :    
792 :     A I<forward location object> is a location in a contig that is transcribed from left to right.
793 :     It is a subclass of the B<BasicLocation> object, and contains methods that require different
794 :     implementation for a forward location than a backward location.
795 :    
796 :     =head2 Override Methods
797 :    
798 :     =head3 Left
799 :    
800 :     C<< my $leftPoint = $loc->Left; >>
801 :    
802 :     Return the offset of the leftmost point of the location.
803 :    
804 :     =cut
805 :    
806 :     sub Left {
807 :     return $_[0]->{_beg};
808 :     }
809 :    
810 :     =head3 Right
811 :    
812 :     C<< my $rightPoint = $loc->Right; >>
813 :    
814 :     Return the offset of the rightmost point of the location.
815 :    
816 :     =cut
817 :    
818 :     sub Right {
819 :     return $_[0]->{_end};
820 :     }
821 :    
822 :     =head3 Compare
823 :    
824 :     C<< my ($distance, $cmp) = $loc->Compare($point); >>
825 :    
826 :     Determine the relative location of the specified point on the contig. Returns a distance,
827 :     which indicates the location relative to the leftmost point of the contig, and a comparison
828 :     number, which is negative if the point is to the left of the location, zero if the point is
829 :     inside the location, and positive if the point is to the right of the location.
830 :    
831 :     =cut
832 :    
833 :     sub Compare {
834 :     # Get the parameters.
835 :     my ($self, $point) = @_;
836 :     # Compute the distance from the begin (leftmost) point.
837 :     my $distance = $point - $self->{_beg};
838 :     # Set the comparison value. The distance works unless it is positive and less than
839 :     # the length. In that case, it's inside the location so we want to return 0.
840 :     my $cmp = (defined $self->IfValid($distance) ? 0 : $distance);
841 :     # Return the results.
842 :     return ($distance, $cmp);
843 :     }
844 :    
845 :     =head3 Split
846 :    
847 :     C<< my $newLocation = $loc->Split($offset); >>
848 :    
849 :     Split this location into two smaller ones at the specified offset from the left endpoint. The
850 :     new location split off of it will be returned. If the offset is at either end of the location,
851 :     no split will occur and an underfined value will be returned.
852 :    
853 :     =over 4
854 :    
855 :     =item offset
856 :    
857 :     Offset into the location from the left endpoint of the point at which it should be split.
858 :    
859 :     =item RETURN
860 :    
861 :     The new location split off of this one, or an undefined value if no split was necessary.
862 :    
863 :     =back
864 :    
865 :     =cut
866 :    
867 :     sub Split {
868 :     # Get the parameters.
869 :     my ($self, $offset) = @_;
870 :     # Declare the return variable.
871 :     my $retVal;
872 :     # Only proceed if a split is necessary.
873 :     if ($offset > 0 && $offset < $self->{_len}) {
874 :     # Save the current starting point.
875 :     my $oldBegin = $self->{_beg};
876 :     # Update this location's starting point and length.
877 :     $self->{_beg} += $offset;
878 :     $self->{_len} -= $offset;
879 :     # Create the new location.
880 :     $retVal = BasicLocation->new($self->{_contigID}, $oldBegin, '+', $offset);
881 :     }
882 :     # Return the new location object.
883 :     return $retVal;
884 :     }
885 :    
886 :    
887 :     =head3 Peel
888 :    
889 :     C<< my $peel = $loc->Peel($length); >>
890 :    
891 :     Peel a specified number of positions off the beginning of the location. Peeling splits
892 :     a location at a specified offset from the beginning, while splitting takes it at a
893 :     specified offset from the left point. If the specified length is equal to or longer
894 :     than the location's length, an undefined value will be returned.
895 :    
896 :     =over 4
897 :    
898 :     =item length
899 :    
900 :     Number of positions to split from the location.
901 :    
902 :     =item RETURN
903 :    
904 :     Returns a new location formed by splitting positions off of the existing location, which is
905 :     shortened accordingly. If the specified length is longer than the location's length, an
906 :     undefined value is returned and the location is not modified.
907 :    
908 :     =back
909 :    
910 :     =cut
911 :     #: Return Type $%;
912 :     sub Peel {
913 :     # Get the parameters.
914 :     my ($self, $length) = @_;
915 :     # Declare the return variable.
916 :     my $retVal;
917 :     # Only proceed if a split is necessary.
918 :     if ($length < $self->{_len}) {
919 :     # Save the current begin point.
920 :     my $oldBegpoint = $self->{_beg};
921 :     # Update this location's begin point and length. We're peeling from the beginning,
922 :     # which for this type of location means the segment is chopped off the left end.
923 :     $self->{_beg} += $length;
924 :     $self->{_len} -= $length;
925 :     # Create the new location.
926 :     $retVal = BasicLocation->new($self->{_contigID}, $oldBegpoint, '+', $length);
927 :     }
928 :     # Return the new location object.
929 :     return $retVal;
930 :     }
931 :    
932 :    
933 :     =head3 Reverse
934 :    
935 :     C<< $loc->Reverse; >>
936 :    
937 :     Change the polarity of the location. The location will have the same nucleotide range, but
938 :     the direction will be changed.
939 :    
940 :     =cut
941 :    
942 :     sub Reverse {
943 :     # Get the parameters.
944 :     my ($self) = @_;
945 :     # Swap the beginning and end, then update the direction.
946 :     ($self->{_beg}, $self->{_end}) = ($self->{_end}, $self->{_beg});
947 :     $self->{_dir} = '-';
948 :     # Re-bless us as a forward location.
949 :     bless $self, "BBasicLocation";
950 :     }
951 :    
952 :     =head3 Index
953 :    
954 :     C<< my $index = $loc->Index($point); >>
955 :    
956 :     Return the index of the specified point in this location. The value returned is the distance
957 :     from the beginning. If the specified point is not in the location, an undefined value is returned.
958 :    
959 :     =over 4
960 :    
961 :     =item point
962 :    
963 :     Offset into the contig of the point in question.
964 :    
965 :     =item RETURN
966 :    
967 :     Returns the distance of the point from the beginning of the location, or an undefined value if the
968 :     point is outside the location.
969 :    
970 :     =back
971 :    
972 :     =cut
973 :    
974 :     sub Index {
975 :     # Get the parameters.
976 :     my ($self, $point) = @_;
977 :     # Compute the distance from the beginning. Because we are in a forward location, this
978 :     # means subtracting the beginning's offset from the point's offset.
979 :     my $retVal = $self->IfValid($point - $self->{_beg});
980 :     # Return the result.
981 :     return $retVal;
982 :     }
983 :    
984 :     =head3 PointOffset
985 :    
986 :     C<< my $offset = $loc->PointOffset($index); >>
987 :    
988 :     Return the offset into the contig of the point at the specified position in the location. A position
989 :     of 0 will return the beginning point, a position of 1 returns the point next to that, and a position
990 :     1 less than the length will return the ending point.
991 :    
992 :     =over 4
993 :    
994 :     =item index
995 :    
996 :     Index into the location of the relevant point.
997 :    
998 :     =item RETURN
999 :    
1000 :     Returns an offset into the contig of the specified point in the location.
1001 :    
1002 :     =back
1003 :    
1004 :     =cut
1005 :    
1006 :     sub PointOffset {
1007 :     # Get the parameters.
1008 :     my ($self, $index) = @_;
1009 :     # Return the offset. This is a forward location, so we add it to the begin point.
1010 :     return $self->{_beg} + $index;
1011 :     }
1012 :    
1013 : parrello 1.2
1014 :     =head3 SetBegin
1015 :    
1016 :     C<< $loc->SetBegin($newBegin); >>
1017 :    
1018 :     Change the begin point of this location without changing the endpoint.
1019 :    
1020 :     =over 4
1021 :    
1022 :     =item newBegin
1023 :    
1024 :     Proposed new beginning point.
1025 :    
1026 :     =back
1027 :    
1028 :     =cut
1029 :     #: Return Type ;
1030 :     sub SetBegin {
1031 :     # Get the parameters.
1032 :     my ($self, $newBegin) = @_;
1033 :     # Update the begin point.
1034 :     $self->{_beg} = $newBegin;
1035 :     # Adjust the length.
1036 :     $self->{_len} = $self->{_end} - $self->{_beg} + 1;
1037 :     }
1038 :    
1039 :     =head3 SetEnd
1040 :    
1041 :     C<< $loc->SetEnd($newEnd); >>
1042 :    
1043 :     Change the endpoint of this location without changing the begin point.
1044 :    
1045 :     =over 4
1046 :    
1047 :     =item newEnd
1048 :    
1049 :     Proposed new ending point.
1050 :    
1051 :     =back
1052 :    
1053 :     =cut
1054 :     #: Return Type ;
1055 :     sub SetEnd {
1056 :     # Get the parameters.
1057 :     my ($self, $newEnd) = @_;
1058 :     # Update the end point.
1059 :     $self->{_end} = $newEnd;
1060 :     # Adjust the length.
1061 :     $self->{_len} = $self->{_end} - $self->{_beg} + 1;
1062 :     }
1063 :    
1064 : parrello 1.6 =head3 Widen
1065 :    
1066 :     C<< my = $loc->Widen($distance, $max); >>
1067 :    
1068 :     Add the specified distance to each end of the location, taking care not to
1069 :     extend past either end of the contig. The contig length must be provided
1070 :     to insure we don't fall off the far end; otherwise, only the leftward
1071 :     expansion is limited.
1072 :    
1073 :     =over 4
1074 :    
1075 :     =item distance
1076 :    
1077 :     Number of positions to add to both ends of the location.
1078 :    
1079 :     =item max (optional)
1080 :    
1081 :     Maximum possible value for the right end of the location.
1082 :    
1083 :     =back
1084 :    
1085 :     =cut
1086 :     #: Return Type ;
1087 :     sub Widen {
1088 :     # Get the parameters.
1089 :     my ($self, $distance, $max) = @_;
1090 :     # Subtract the distance from the begin point.
1091 :     my $newBegin = FIG::max(1, $self->Begin - $distance);
1092 :     $self->SetBegin($newBegin);
1093 :     # Add the distance to the end point, keeping track of the maximum.
1094 :     my $newEnd = $self->EndPoint + $distance;
1095 :     if ($max && $newEnd > $max) {
1096 :     $newEnd = $max;
1097 :     }
1098 :     $self->SetEnd($newEnd);
1099 :     }
1100 : parrello 1.2
1101 : parrello 1.1 1;
1102 :    
1103 :     package BBasicLocation;
1104 :    
1105 :     @BBasicLocation::ISA = qw(BasicLocation);
1106 :    
1107 :     =head1 Backward Basic BasicLocation Object
1108 :    
1109 :     =head2 Introduction
1110 :    
1111 :     A I<backward location object> is a location in a contig that is transcribed from right to left.
1112 :     It is a subclass of the B<BasicLocation> object, and contains methods that require different
1113 :     implementation for a forward location than a backward location.
1114 :    
1115 :     =head2 Override Methods
1116 :    
1117 :     =head3 Left
1118 :    
1119 :     C<< my $leftPoint = $loc->Left; >>
1120 :    
1121 :     Return the offset of the leftmost point of the location.
1122 :    
1123 :     =cut
1124 :    
1125 :     sub Left {
1126 :     return $_[0]->{_end};
1127 :     }
1128 :    
1129 :     =head3 Right
1130 :    
1131 :     C<< my $rightPoint = $loc->Right; >>
1132 :    
1133 :     Return the offset of the rightmost point of the location.
1134 :    
1135 :     =cut
1136 :    
1137 :     sub Right {
1138 :     return $_[0]->{_beg};
1139 :     }
1140 :    
1141 :     =head3 Compare
1142 :    
1143 :     C<< my ($distance, $cmp) = $loc->Compare($point); >>
1144 :    
1145 :     Determine the relative location of the specified point on the contig. Returns a distance,
1146 :     which indicates the location relative to the leftmost point of the contig, and a comparison
1147 :     number, which is negative if the point is to the left of the location, zero if the point is
1148 :     inside the location, and positive if the point is to the right of the location.
1149 :    
1150 :     =cut
1151 :    
1152 :     sub Compare {
1153 :     # Get the parameters.
1154 :     my ($self, $point) = @_;
1155 :     # Compute the distance from the end (leftmost) point.
1156 :     my $distance = $point - $self->{_end};
1157 :     # Set the comparison value. The distance works unless it is positive and less than
1158 :     # the length. In that case, it's inside the location so we want to return 0.
1159 :     my $cmp = (defined $self->IfValid($distance) ? 0 : $distance);
1160 :     # Return the results.
1161 :     return ($distance, $cmp);
1162 :     }
1163 :    
1164 :     =head3 Split
1165 :    
1166 :     C<< my $newLocation = $loc->Split($offset); >>
1167 :    
1168 :     Split this location into two smaller ones at the specified offset from the left endpoint. The
1169 :     new location split off of it will be returned. If the offset is at either end of the location,
1170 :     no split will occur and an underfined value will be returned.
1171 :    
1172 :     =over 4
1173 :    
1174 :     =item offset
1175 :    
1176 :     Offset into the location from the left endpoint of the point at which it should be split.
1177 :    
1178 :     =item RETURN
1179 :    
1180 :     The new location split off of this one, or an undefined value if no split was necessary.
1181 :    
1182 :     =back
1183 :    
1184 :     =cut
1185 :    
1186 :     sub Split {
1187 :     # Get the parameters.
1188 :     my ($self, $offset) = @_;
1189 :     # Declare the return variable.
1190 :     my $retVal;
1191 :     # Only proceed if a split is necessary.
1192 :     if ($offset > 0 && $offset < $self->{_len}) {
1193 :     # Save the current ending point.
1194 :     my $oldEndpoint = $self->{_end};
1195 :     # Update this location's ending point and length.
1196 :     $self->{_end} += $offset;
1197 :     $self->{_len} -= $offset;
1198 :     # Create the new location.
1199 :     $retVal = BasicLocation->new($self->{_contigID}, $oldEndpoint + $offset - 1, '-', $offset);
1200 :     }
1201 :     # Return the new location object.
1202 :     return $retVal;
1203 :     }
1204 :    
1205 :     =head3 Peel
1206 :    
1207 :     C<< my $peel = $loc->Peel($length); >>
1208 :    
1209 :     Peel a specified number of positions off the beginning of the location. Peeling splits
1210 :     a location at a specified offset from the beginning, while splitting takes it at a
1211 :     specified offset from the left point. If the specified length is equal to or longer
1212 :     than the location's length, an undefined value will be returned.
1213 :    
1214 :     =over 4
1215 :    
1216 :     =item length
1217 :    
1218 :     Number of positions to split from the location.
1219 :    
1220 :     =item RETURN
1221 :    
1222 :     Returns a new location formed by splitting positions off of the existing location, which is
1223 :     shortened accordingly. If the specified length is longer than the location's length, an
1224 :     undefined value is returned and the location is not modified.
1225 :    
1226 :     =back
1227 :    
1228 :     =cut
1229 :     #: Return Type $%;
1230 :     sub Peel {
1231 :     # Get the parameters.
1232 :     my ($self, $length) = @_;
1233 :     # Declare the return variable.
1234 :     my $retVal;
1235 :     # Only proceed if a split is necessary.
1236 :     if ($length < $self->{_len}) {
1237 :     # Save the current begin point.
1238 :     my $oldBegpoint = $self->{_beg};
1239 :     # Update this location's begin point and length. We're peeling from the beginning,
1240 :     # which for this type of location means the segment is chopped off the right end.
1241 :     $self->{_beg} -= $length;
1242 :     $self->{_len} -= $length;
1243 :     # Create the new location.
1244 : parrello 1.4 $retVal = BasicLocation->new($self->{_contigID}, $oldBegpoint, '-', $length);
1245 : parrello 1.1 }
1246 :     # Return the new location object.
1247 :     return $retVal;
1248 :     }
1249 :    
1250 :     =head3 Reverse
1251 :    
1252 :     C<< $loc->Reverse; >>
1253 :    
1254 :     Change the polarity of the location. The location will have the same nucleotide range, but
1255 :     the direction will be changed.
1256 :    
1257 :     =cut
1258 :    
1259 :     sub Reverse {
1260 :     # Get the parameters.
1261 :     my ($self) = @_;
1262 :     # Swap the beginning and end, then update the direction.
1263 :     ($self->{_beg}, $self->{_end}) = ($self->{_end}, $self->{_beg});
1264 :     $self->{_dir} = '+';
1265 :     # Re-bless us as a forward location.
1266 :     bless $self, "FBasicLocation";
1267 :     }
1268 :    
1269 :     =head3 Index
1270 :    
1271 :     C<< my $index = $loc->Index($point); >>
1272 :    
1273 :     Return the index of the specified point in this location. The value returned is the distance
1274 :     from the beginning. If the specified point is not in the location, an undefined value is returned.
1275 :    
1276 :     =over 4
1277 :    
1278 :     =item point
1279 :    
1280 :     Offset into the contig of the point in question.
1281 :    
1282 :     =item RETURN
1283 :    
1284 :     Returns the distance of the point from the beginning of the location, or an undefined value if the
1285 :     point is outside the location.
1286 :    
1287 :     =back
1288 :    
1289 :     =cut
1290 :    
1291 :     sub Index {
1292 :     # Get the parameters.
1293 :     my ($self, $point) = @_;
1294 :     # Compute the distance from the beginning. Because we are in a backward location, this
1295 :     # means subtracting the point's offset from the beginning's offset.
1296 :     my $retVal = $self->IfValid($self->{_beg} - $point);
1297 :     # Return the result.
1298 :     return $retVal;
1299 :     }
1300 :    
1301 :     =head3 PointOffset
1302 :    
1303 :     C<< my $offset = $loc->PointOffset($index); >>
1304 :    
1305 :     Return the offset into the contig of the point at the specified position in the location. A position
1306 :     of 0 will return the beginning point, a position of 1 returns the point next to that, and a position
1307 :     1 less than the length will return the ending point.
1308 :    
1309 :     =over 4
1310 :    
1311 :     =item index
1312 :    
1313 :     Index into the location of the relevant point.
1314 :    
1315 :     =item RETURN
1316 :    
1317 :     Returns an offset into the contig of the specified point in the location.
1318 :    
1319 :     =back
1320 :    
1321 :     =cut
1322 :    
1323 :     sub PointOffset {
1324 :     # Get the parameters.
1325 :     my ($self, $index) = @_;
1326 :     # Return the offset. This is a backward location, so we subtract it from the begin point.
1327 :     return $self->{_beg} - $index;
1328 :     }
1329 :    
1330 : parrello 1.2 =head3 SetBegin
1331 :    
1332 :     C<< $loc->SetBegin($newBegin); >>
1333 :    
1334 :     Change the begin point of this location without changing the endpoint.
1335 :    
1336 :     =over 4
1337 :    
1338 :     =item newBegin
1339 :    
1340 :     Proposed new beginning point.
1341 :    
1342 :     =back
1343 :    
1344 :     =cut
1345 :     #: Return Type ;
1346 :     sub SetBegin {
1347 :     # Get the parameters.
1348 :     my ($self, $newBegin) = @_;
1349 :     # Update the begin point.
1350 :     $self->{_beg} = $newBegin;
1351 :     # Adjust the length.
1352 :     $self->{_len} = $self->{_beg} - $self->{_end} + 1;
1353 :     }
1354 :    
1355 :     =head3 SetEnd
1356 :    
1357 :     C<< $loc->SetEnd($newEnd); >>
1358 :    
1359 :     Change the endpoint of this location without changing the begin point.
1360 :    
1361 :     =over 4
1362 :    
1363 :     =item newEnd
1364 :    
1365 :     Proposed new ending point.
1366 :    
1367 :     =back
1368 :    
1369 :     =cut
1370 :     #: Return Type ;
1371 :     sub SetEnd {
1372 :     # Get the parameters.
1373 :     my ($self, $newEnd) = @_;
1374 :     # Update the end point.
1375 :     $self->{_end} = $newEnd;
1376 :     # Adjust the length.
1377 :     $self->{_len} = $self->{_beg} - $self->{_end} + 1;
1378 :     }
1379 :    
1380 : parrello 1.6 =head3 Widen
1381 :    
1382 :     C<< my = $loc->Widen($distance, $max); >>
1383 :    
1384 :     Add the specified distance to each end of the location, taking care not to
1385 :     extend past either end of the contig. The contig length must be provided
1386 :     to insure we don't fall off the far end; otherwise, only the leftward
1387 :     expansion is limited.
1388 :    
1389 :     =over 4
1390 : parrello 1.2
1391 : parrello 1.6 =item distance
1392 :    
1393 :     Number of positions to add to both ends of the location.
1394 :    
1395 :     =item max (optional)
1396 :    
1397 :     Maximum possible value for the right end of the location.
1398 :    
1399 :     =back
1400 :    
1401 :     =cut
1402 :     #: Return Type ;
1403 :     sub Widen {
1404 :     # Get the parameters.
1405 :     my ($self, $distance, $max) = @_;
1406 :     # Subtract the distance from the end point.
1407 :     my $newEnd = FIG::max(1, $self->End - $distance);
1408 :     $self->SetEnd($newEnd);
1409 :     # Add the distance to the begin point, keeping track of the maximum.
1410 :     my $newBegin = $self->Begin + $distance;
1411 :     if ($max && $newBegin > $max) {
1412 :     $newBegin = $max;
1413 :     }
1414 :     $self->SetBegin($newEnd);
1415 :     }
1416 : parrello 1.2
1417 : parrello 1.1 1;
1418 :    

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3