[Bio] / FigKernelPackages / MergeTransactions.pm Repository:
ViewVC logotype

Annotation of /FigKernelPackages/MergeTransactions.pm

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (view) (download) (as text)

1 : parrello 1.1 #!/usr/bin/perl -w
2 :    
3 :     package MergeTransactions;
4 :    
5 :     use TransactionProcessor;
6 :     @ISA = ('TransactionProcessor');
7 :    
8 :     use strict;
9 :     use Tracer;
10 :     use PageBuilder;
11 :     use FIG;
12 :    
13 :     =head1 Merge Transactions
14 :    
15 :     =head2 Special Note
16 :    
17 :     THIS MODULE HAS NOT BEEN IMPLEMENTED. IT IS SAVED FOR POSSIBLE FUTURE USE.
18 :    
19 :     =head2 Introduction
20 :    
21 :     This is a TransactionProcessor subclass that updates the C<peg.synonyms> and C<NR> files
22 :     to take into account transaction changes. Our goal is to re-constitute the PEG synonyms
23 :     table and then rebuild the NR file from it.
24 :    
25 :     To understand this process, we must first understand the concept of a PEG synonym.
26 :     Two PEGs are considered I<pseudo-equivalent> if the shorter matches the tail of
27 :     the longer, and the shorter is no less than 70% the length of the longer. A pair
28 :     of PEGs are considered I<synonyms> if they are both pseudo-equivalent to the same
29 :     longer PEG.
30 :    
31 :     The concept of I<synonym> is almost an equivalence relation. We can, however,
32 :     partition PEGs into a set of classes which have the property that each PEG
33 :     in the class is pseudo-equivalent to a single PEG of maximal length, called
34 :     the I<principal synonym>. If so, similarities between principal synonyms will
35 :     usually imply similarity between each pair of PEGs in the principal synonyms'
36 :     partitions. That is, if principal synonym C<A> is similar to principal synonym
37 :     C<B>, then each PEG in C<A>'s partition is probably similar to each PEG in
38 :     C<B>'s partition.
39 :    
40 :     The C<NR> file contains the translation for each principal synonym, in FASTA
41 :     form. It is this file that is used to generate similarities.
42 :    
43 :     When transactions come through, they will delete some PEGs, add new PEGs, and
44 :     replace old PEGs with new ones. For the purposes of this algorithm, we will
45 :     treat the transactions as a set of deletes and a set of insertions. The
46 :     merge process then involves three steps.
47 :    
48 :     =over 4
49 :    
50 :     =item Collect
51 :    
52 :     Collect the IDs of the inserted PEGs. These will be output to a file
53 :     containing the PEG ID and their reversed translations. The file
54 :     will be sorted by translation for the purposes of the merge step.
55 :    
56 :     =item Reduce
57 :    
58 :     Remove the deleted PEGs from the C<peg.synonyms> file. We will assume
59 :     that a PEG has been deleted if it is either marked for deletion or we
60 :     cannot find it in the SEED database. This process will also output a
61 :     file containing the principal synonyms, their reversed translations,
62 :     and the PEGs in their equivalence clases. This file will also be sorted by
63 :     translation for the purposes of the merge step.
64 :    
65 :     =item Merge
66 :    
67 :     Merge the inserted PEGs into the C<peg.synonyms> file and re-create
68 :     the NR file. This involves going through the two output files from
69 :     the previous steps to determine whether the new PEGs belong in a
70 :     new class, will be ther new principal synonym of an existing class,
71 :     or merely the member of an existing class.
72 :    
73 :     =back
74 :    
75 :     =head2 Methods
76 :    
77 :     =head3 new
78 :    
79 :     C<< my $xprc = MergeTransactions->new(\%options, $command, $directory, $idFile); >>
80 :    
81 :     Construct a new MergeTransactions object.
82 :    
83 :     =over 4
84 :    
85 :     =item options
86 :    
87 :     Reference to a hash table containing the command-line options.
88 :    
89 :     =item command
90 :    
91 :     Command specified on the B<TransactFeatures> command line. This command determines
92 :     which TransactionProcessor subclass is active.
93 :    
94 :     =item directory
95 :    
96 :     Directory containing the transaction files.
97 :    
98 :     =item idFile
99 :    
100 :     Name of the ID file (if needed).
101 :    
102 :     =back
103 :    
104 :     =cut
105 :    
106 :     sub new {
107 :     # Get the parameters.
108 :     my ($class, $options, $command, $directory, $idFile) = @_;
109 :     # Construct via the subclass.
110 :     return TransactionProcessor::new($class, $options, $command, $directory, $idFile);
111 :     }
112 :    
113 :     =head3 Setup
114 :    
115 :     C<< $xprc->Setup(); >>
116 :    
117 :     Set up to apply the transactions. This includes reading the ID file.
118 :    
119 :     =cut
120 :     #: Return Type ;
121 :     sub Setup {
122 :     # Get the parameters.
123 :     my ($self) = @_;
124 :     # Read the ID hash from the ID file.
125 :     $self->ReadIDHash();
126 :     # TODO
127 :     }
128 :    
129 :     =head3 SetupGenome
130 :    
131 :     C<< $xprc->SetupGenome(); >>
132 :    
133 :     Set up for processing this genome. This involves opening the output file
134 :     for the transaction trace. The transaction trace essentially contains the
135 :     incoming transactions with the pseudo-IDs replaced by real IDs.
136 :    
137 :     =cut
138 :     #: Return Type ;
139 :     sub SetupGenome {
140 :     # Get the parameters.
141 :     my ($self) = @_;
142 :     my $fig = $self->FIG();
143 :     # TODO
144 :     }
145 :    
146 :     =head3 TeardownGenome
147 :    
148 :     C<< $xprc->TeardownGenome(); >>
149 :    
150 :     Clean up after processing this genome. This involves closing the transaction
151 :     trace file and optionally committing any updates.
152 :    
153 :     =cut
154 :     #: Return Type ;
155 :     sub TeardownGenome {
156 :     # Get the parameters.
157 :     my ($self) = @_;
158 :     my $fig = $self->FIG();
159 :     # TODO
160 :     }
161 :    
162 :     =head3 Add
163 :    
164 :     C<< $xprc->Add($newID, $locations, $translation); >>
165 :    
166 :     Add a new feature to the data store.
167 :    
168 :     =over 4
169 :    
170 :     =item newID
171 :    
172 :     ID to give to the new feature.
173 :    
174 :     =item locations
175 :    
176 :     Location of the new feature, in the form of a comma-separated list of location
177 :     strings in SEED format.
178 :    
179 :     =item translation (optional)
180 :    
181 :     Protein translation string for the new feature. If this field is omitted and
182 :     the feature is a peg, the translation will be generated by normal means.
183 :    
184 :     =back
185 :    
186 :     =cut
187 :    
188 :     sub Add {
189 :     my ($self, $newID, $locations, $translation) = @_;
190 :     my $fig = $self->{fig};
191 :     # Extract the feature type and ordinal number from the new ID.
192 :     my ($ftype, $ordinal, $key) = $self->ParseNewID($newID);
193 :     # TODO
194 :     }
195 :    
196 :     =head3 Change
197 :    
198 :     C<< $xprc->Change($fid, $newID, $locations, $aliases, $translation); >>
199 :    
200 :     Replace a feature to the data store. The feature will be marked for deletion and
201 :     a new feature will be put in its place.
202 :    
203 :     This is a much more complicated process than adding a feature. In addition to
204 :     the add, we have to create new aliases and transfer across the assignment and
205 :     the annotations.
206 :    
207 :     =over 4
208 :    
209 :     =item fid
210 :    
211 :     ID of the feature being changed.
212 :    
213 :     =item newID
214 :    
215 :     New ID to give to the feature.
216 :    
217 :     =item locations
218 :    
219 :     New location to give to the feature, in the form of a comma-separated list of location
220 :     strings in SEED format.
221 :    
222 :     =item aliases (optional)
223 :    
224 :     A new list of alias names for the feature.
225 :    
226 :     =item translation (optional)
227 :    
228 :     New protein translation string for the feature. If this field is omitted and
229 :     the feature is a peg, the translation will be generated by normal means.
230 :    
231 :     =back
232 :    
233 :     =cut
234 :    
235 :     sub Change {
236 :     my ($self, $fid, $newID, $locations, $aliases, $translation) = @_;
237 :     my $fig = $self->{fig};
238 :     # Extract the feature type and ordinal number from the new ID.
239 :     my ($ftype, $ordinal, $key) = $self->ParseNewID($newID);
240 :     # TODO
241 :     }
242 :    
243 :     =head3 Delete
244 :    
245 :     C<< $xprc->Delete($fid); >>
246 :    
247 :     Delete a feature from the data store. The feature will be marked as deleted,
248 :     which will remove it from consideration by most FIG methods. A garbage
249 :     collection job will be run later to permanently delete the feature.
250 :    
251 :     =over 4
252 :    
253 :     =item fid
254 :    
255 :     ID of the feature to delete.
256 :    
257 :     =back
258 :    
259 :     =cut
260 :    
261 :     sub Delete {
262 :     my ($self, $fid) = @_;
263 :     my $fig = $self->{fig};
264 :     # TODO
265 :     }
266 :    
267 :     1;

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3