[Bio] / FigKernelPackages / MergeTransactions.pm Repository:
ViewVC logotype

Annotation of /FigKernelPackages/MergeTransactions.pm

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.2 - (view) (download) (as text)

1 : parrello 1.1 #!/usr/bin/perl -w
2 : olson 1.2 #
3 :     # Copyright (c) 2003-2006 University of Chicago and Fellowship
4 :     # for Interpretations of Genomes. All Rights Reserved.
5 :     #
6 :     # This file is part of the SEED Toolkit.
7 :     #
8 :     # The SEED Toolkit is free software. You can redistribute
9 :     # it and/or modify it under the terms of the SEED Toolkit
10 :     # Public License.
11 :     #
12 :     # You should have received a copy of the SEED Toolkit Public License
13 :     # along with this program; if not write to the University of Chicago
14 :     # at info@ci.uchicago.edu or the Fellowship for Interpretation of
15 :     # Genomes at veronika@thefig.info or download a copy from
16 :     # http://www.theseed.org/LICENSE.TXT.
17 :     #
18 :    
19 : parrello 1.1
20 :     package MergeTransactions;
21 :    
22 :     use TransactionProcessor;
23 :     @ISA = ('TransactionProcessor');
24 :    
25 :     use strict;
26 :     use Tracer;
27 :     use PageBuilder;
28 :     use FIG;
29 :    
30 :     =head1 Merge Transactions
31 :    
32 :     =head2 Special Note
33 :    
34 :     THIS MODULE HAS NOT BEEN IMPLEMENTED. IT IS SAVED FOR POSSIBLE FUTURE USE.
35 :    
36 :     =head2 Introduction
37 :    
38 :     This is a TransactionProcessor subclass that updates the C<peg.synonyms> and C<NR> files
39 :     to take into account transaction changes. Our goal is to re-constitute the PEG synonyms
40 :     table and then rebuild the NR file from it.
41 :    
42 :     To understand this process, we must first understand the concept of a PEG synonym.
43 :     Two PEGs are considered I<pseudo-equivalent> if the shorter matches the tail of
44 :     the longer, and the shorter is no less than 70% the length of the longer. A pair
45 :     of PEGs are considered I<synonyms> if they are both pseudo-equivalent to the same
46 :     longer PEG.
47 :    
48 :     The concept of I<synonym> is almost an equivalence relation. We can, however,
49 :     partition PEGs into a set of classes which have the property that each PEG
50 :     in the class is pseudo-equivalent to a single PEG of maximal length, called
51 :     the I<principal synonym>. If so, similarities between principal synonyms will
52 :     usually imply similarity between each pair of PEGs in the principal synonyms'
53 :     partitions. That is, if principal synonym C<A> is similar to principal synonym
54 :     C<B>, then each PEG in C<A>'s partition is probably similar to each PEG in
55 :     C<B>'s partition.
56 :    
57 :     The C<NR> file contains the translation for each principal synonym, in FASTA
58 :     form. It is this file that is used to generate similarities.
59 :    
60 :     When transactions come through, they will delete some PEGs, add new PEGs, and
61 :     replace old PEGs with new ones. For the purposes of this algorithm, we will
62 :     treat the transactions as a set of deletes and a set of insertions. The
63 :     merge process then involves three steps.
64 :    
65 :     =over 4
66 :    
67 :     =item Collect
68 :    
69 :     Collect the IDs of the inserted PEGs. These will be output to a file
70 :     containing the PEG ID and their reversed translations. The file
71 :     will be sorted by translation for the purposes of the merge step.
72 :    
73 :     =item Reduce
74 :    
75 :     Remove the deleted PEGs from the C<peg.synonyms> file. We will assume
76 :     that a PEG has been deleted if it is either marked for deletion or we
77 :     cannot find it in the SEED database. This process will also output a
78 :     file containing the principal synonyms, their reversed translations,
79 :     and the PEGs in their equivalence clases. This file will also be sorted by
80 :     translation for the purposes of the merge step.
81 :    
82 :     =item Merge
83 :    
84 :     Merge the inserted PEGs into the C<peg.synonyms> file and re-create
85 :     the NR file. This involves going through the two output files from
86 :     the previous steps to determine whether the new PEGs belong in a
87 :     new class, will be ther new principal synonym of an existing class,
88 :     or merely the member of an existing class.
89 :    
90 :     =back
91 :    
92 :     =head2 Methods
93 :    
94 :     =head3 new
95 :    
96 :     C<< my $xprc = MergeTransactions->new(\%options, $command, $directory, $idFile); >>
97 :    
98 :     Construct a new MergeTransactions object.
99 :    
100 :     =over 4
101 :    
102 :     =item options
103 :    
104 :     Reference to a hash table containing the command-line options.
105 :    
106 :     =item command
107 :    
108 :     Command specified on the B<TransactFeatures> command line. This command determines
109 :     which TransactionProcessor subclass is active.
110 :    
111 :     =item directory
112 :    
113 :     Directory containing the transaction files.
114 :    
115 :     =item idFile
116 :    
117 :     Name of the ID file (if needed).
118 :    
119 :     =back
120 :    
121 :     =cut
122 :    
123 :     sub new {
124 :     # Get the parameters.
125 :     my ($class, $options, $command, $directory, $idFile) = @_;
126 :     # Construct via the subclass.
127 :     return TransactionProcessor::new($class, $options, $command, $directory, $idFile);
128 :     }
129 :    
130 :     =head3 Setup
131 :    
132 :     C<< $xprc->Setup(); >>
133 :    
134 :     Set up to apply the transactions. This includes reading the ID file.
135 :    
136 :     =cut
137 :     #: Return Type ;
138 :     sub Setup {
139 :     # Get the parameters.
140 :     my ($self) = @_;
141 :     # Read the ID hash from the ID file.
142 :     $self->ReadIDHash();
143 :     # TODO
144 :     }
145 :    
146 :     =head3 SetupGenome
147 :    
148 :     C<< $xprc->SetupGenome(); >>
149 :    
150 :     Set up for processing this genome. This involves opening the output file
151 :     for the transaction trace. The transaction trace essentially contains the
152 :     incoming transactions with the pseudo-IDs replaced by real IDs.
153 :    
154 :     =cut
155 :     #: Return Type ;
156 :     sub SetupGenome {
157 :     # Get the parameters.
158 :     my ($self) = @_;
159 :     my $fig = $self->FIG();
160 :     # TODO
161 :     }
162 :    
163 :     =head3 TeardownGenome
164 :    
165 :     C<< $xprc->TeardownGenome(); >>
166 :    
167 :     Clean up after processing this genome. This involves closing the transaction
168 :     trace file and optionally committing any updates.
169 :    
170 :     =cut
171 :     #: Return Type ;
172 :     sub TeardownGenome {
173 :     # Get the parameters.
174 :     my ($self) = @_;
175 :     my $fig = $self->FIG();
176 :     # TODO
177 :     }
178 :    
179 :     =head3 Add
180 :    
181 :     C<< $xprc->Add($newID, $locations, $translation); >>
182 :    
183 :     Add a new feature to the data store.
184 :    
185 :     =over 4
186 :    
187 :     =item newID
188 :    
189 :     ID to give to the new feature.
190 :    
191 :     =item locations
192 :    
193 :     Location of the new feature, in the form of a comma-separated list of location
194 :     strings in SEED format.
195 :    
196 :     =item translation (optional)
197 :    
198 :     Protein translation string for the new feature. If this field is omitted and
199 :     the feature is a peg, the translation will be generated by normal means.
200 :    
201 :     =back
202 :    
203 :     =cut
204 :    
205 :     sub Add {
206 :     my ($self, $newID, $locations, $translation) = @_;
207 :     my $fig = $self->{fig};
208 :     # Extract the feature type and ordinal number from the new ID.
209 :     my ($ftype, $ordinal, $key) = $self->ParseNewID($newID);
210 :     # TODO
211 :     }
212 :    
213 :     =head3 Change
214 :    
215 :     C<< $xprc->Change($fid, $newID, $locations, $aliases, $translation); >>
216 :    
217 :     Replace a feature to the data store. The feature will be marked for deletion and
218 :     a new feature will be put in its place.
219 :    
220 :     This is a much more complicated process than adding a feature. In addition to
221 :     the add, we have to create new aliases and transfer across the assignment and
222 :     the annotations.
223 :    
224 :     =over 4
225 :    
226 :     =item fid
227 :    
228 :     ID of the feature being changed.
229 :    
230 :     =item newID
231 :    
232 :     New ID to give to the feature.
233 :    
234 :     =item locations
235 :    
236 :     New location to give to the feature, in the form of a comma-separated list of location
237 :     strings in SEED format.
238 :    
239 :     =item aliases (optional)
240 :    
241 :     A new list of alias names for the feature.
242 :    
243 :     =item translation (optional)
244 :    
245 :     New protein translation string for the feature. If this field is omitted and
246 :     the feature is a peg, the translation will be generated by normal means.
247 :    
248 :     =back
249 :    
250 :     =cut
251 :    
252 :     sub Change {
253 :     my ($self, $fid, $newID, $locations, $aliases, $translation) = @_;
254 :     my $fig = $self->{fig};
255 :     # Extract the feature type and ordinal number from the new ID.
256 :     my ($ftype, $ordinal, $key) = $self->ParseNewID($newID);
257 :     # TODO
258 :     }
259 :    
260 :     =head3 Delete
261 :    
262 :     C<< $xprc->Delete($fid); >>
263 :    
264 :     Delete a feature from the data store. The feature will be marked as deleted,
265 :     which will remove it from consideration by most FIG methods. A garbage
266 :     collection job will be run later to permanently delete the feature.
267 :    
268 :     =over 4
269 :    
270 :     =item fid
271 :    
272 :     ID of the feature to delete.
273 :    
274 :     =back
275 :    
276 :     =cut
277 :    
278 :     sub Delete {
279 :     my ($self, $fid) = @_;
280 :     my $fig = $self->{fig};
281 :     # TODO
282 :     }
283 :    
284 :     1;

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3