Parent Directory
|
Revision Log
Revision 1.14 - (view) (download) (as text)
1 : | parrello | 1.1 | #!/usr/bin/perl -w |
2 : | |||
3 : | =head1 Load Sprout Tables | ||
4 : | |||
5 : | parrello | 1.12 | =head2 Introduction |
6 : | |||
7 : | parrello | 1.14 | The Sprout database reflects a snapshot of the SEED taken at a particular point in |
8 : | time. At some point in the future, it will be possible to add annotations to the | ||
9 : | Sprout data. All records added to Sprout after the snapshot is taken are | ||
10 : | specially-marked so that the changes can be copied to the SEED. The SEED remains | ||
11 : | the live version of the data. | ||
12 : | |||
13 : | The snapshot is produced by reading the SEED data and writing it to sequential | ||
14 : | files. There is one file per Sprout table, and each such file's name consists of | ||
15 : | the table name with the suffix C<dtx>. Thus, the file for the C<Genome> table | ||
16 : | would be named C<Genome.dtx>. These files are used to load the actual Sprout | ||
17 : | database and to generate Glimpse indices. | ||
18 : | |||
19 : | To load all the Sprout tables and then validate the result, you need to issue three | ||
20 : | commands. | ||
21 : | |||
22 : | LoadSproutTables -dbLoad -dbCreate "*" | ||
23 : | TestSproutLoad | ||
24 : | index_sprout | ||
25 : | |||
26 : | All three commands send output to the console. In addition, C<LoadSproutTables> and | ||
27 : | C<TestSproutLoad> write tracing information to C<trace.log> in the FIG temporary | ||
28 : | directory (B<$FIG_Config::Tmp>). At the bottom of the log file will be a complete | ||
29 : | list of errors. If errors occur in C<LoadSproutTables>, then the data must be corrected | ||
30 : | and the offending table group reloaded. So, for example, if there are errors in the | ||
31 : | load of the B<MadeAnnotation> and B<Compound> tables, you would need to run | ||
32 : | |||
33 : | LoadSproutTables -dbLoad Annotation Reaction | ||
34 : | |||
35 : | because B<MadeAnnotation> is in the C<Annotation> group, and B<Compound> is in the | ||
36 : | C<Reaction> group. A list of the groups is given below. | ||
37 : | |||
38 : | You can omit the C<dbLoad> option to create the load files without | ||
39 : | loading the database, and you can add a C<trace> option to change the trace level. | ||
40 : | The command below creates the Genome-related load files with a trace level of 3 and | ||
41 : | does not load them into the Sprout database. | ||
42 : | |||
43 : | LoadSproutTables -trace=3 Genome | ||
44 : | |||
45 : | C<LoadSproutTables> takes a long time to run, so setting the trace level to 3 helps | ||
46 : | to give you an idea of the progress. | ||
47 : | |||
48 : | Once the Sprout database is loaded, B<TestSproutLoad> can be used to verify the load | ||
49 : | against the FIG data. Again, the end of the C<trace.log> file will contain a summary | ||
50 : | of the errors found. Like C<LoadSproutTables>, C<TestSproutLoad> is a time-consuming | ||
51 : | script, so you may want to set the trace level to 3 to see visible progress. | ||
52 : | |||
53 : | TestSproutLoad -trace=3 | ||
54 : | |||
55 : | Unlike C<LoadSproutTables>, in C<TestSproutLoad>, the individual errors found are | ||
56 : | mixed in with the trace messages. They are all, however, marked with a trace type | ||
57 : | of B<Problem>, as shown in the fragment below. | ||
58 : | |||
59 : | 11/02/2005 19:15:16 <main>: Processing feature fig|100226.1.peg.7742. | ||
60 : | 11/02/2005 19:15:17 <main>: Processing feature fig|100226.1.peg.7741. | ||
61 : | 11/02/2005 19:15:17 <Problem>: assignment "Short-chain dehydrodenase ... | ||
62 : | 11/02/2005 19:15:17 <Problem>: assignment "putative oxidoreductase." ... | ||
63 : | 11/02/2005 19:15:17 <Problem>: Incorrect assignment for fig|100226.1.peg.7741... | ||
64 : | 11/02/2005 19:15:17 <Problem>: Incorrect number of annotations found in ... | ||
65 : | 11/02/2005 19:15:17 <main>: Processing feature fig|100226.1.peg.7740. | ||
66 : | 11/02/2005 19:15:18 <main>: Processing feature fig|100226.1.peg.7739. | ||
67 : | |||
68 : | The test may reveal that some tables need to be reloaded, or that a software | ||
69 : | problem has crept into the Sprout. | ||
70 : | |||
71 : | Once all the tables have the correct data, C<index_sprout> can be run to create the | ||
72 : | Glimpse indexes. | ||
73 : | |||
74 : | =head2 Procedure For Loading Sprout | ||
75 : | |||
76 : | =over 4 | ||
77 : | |||
78 : | =item 1 | ||
79 : | |||
80 : | Type C<LoadSproutTables -dbLoad -dbCreate "*"> and press ENTER. This will create | ||
81 : | the C<dtx> files and load them. | ||
82 : | |||
83 : | =item 2 | ||
84 : | |||
85 : | Type C<TestSproutLoad> and press ENTER. This will validate the Sprout database | ||
86 : | against the SEED data. | ||
87 : | |||
88 : | =item 3 | ||
89 : | |||
90 : | If any errors are detected in step (2), it is most likely due to a change in | ||
91 : | SEED that did not make it to Sprout. Contact Bruce Parrello or Robert Olson | ||
92 : | to get the code updated properly. | ||
93 : | |||
94 : | =item 4 | ||
95 : | |||
96 : | Type C<index_sprout> and press ENTER. This will create the Glimpse indexes | ||
97 : | for the Sprout data. | ||
98 : | |||
99 : | =back | ||
100 : | |||
101 : | =head2 LoadSproutTables Command | ||
102 : | |||
103 : | C<LoadSproutTables> creates the load files for Sprout tables and optionally loads them. | ||
104 : | parrello | 1.12 | The parameters are the names of the table groups whose data is to be created. |
105 : | The legal table group names are given below. | ||
106 : | parrello | 1.1 | |
107 : | =over 4 | ||
108 : | |||
109 : | =item Genome | ||
110 : | |||
111 : | Loads B<Genome>, B<HasContig>, B<Contig>, B<IsMadeUpOf>, and B<Sequence>. | ||
112 : | |||
113 : | =item Coupling | ||
114 : | |||
115 : | Loads B<Coupling>, B<IsEvidencedBy>, B<PCH>, B<ParticipatesInCoupling>, | ||
116 : | B<UsesAsEvidence>. | ||
117 : | |||
118 : | =item Feature | ||
119 : | |||
120 : | Loads B<Feature>, B<FeatureAlias>, B<FeatureTranslation>, B<FeatureUpstream>, | ||
121 : | parrello | 1.2 | B<IsLocatedIn>, B<FeatureLink>. |
122 : | parrello | 1.1 | |
123 : | =item Subsystem | ||
124 : | |||
125 : | parrello | 1.2 | Loads B<Subsystem>, B<Role>, B<SSCell>, B<ContainsFeature>, B<IsGenomeOf>, |
126 : | parrello | 1.8 | B<IsRoleOf>, B<OccursInSubsystem>, B<ParticipatesIn>, B<HasSSCell>, |
127 : | parrello | 1.11 | B<Catalyzes>, B<ConsistsOfRoles>, B<RoleSubset>, B<HasRoleSubset>, |
128 : | parrello | 1.13 | B<ConsistsOfGenomes>, B<GenomeSubset>, B<HasGenomeSubset>, B<Diagram>, |
129 : | B<RoleOccursIn>. | ||
130 : | parrello | 1.1 | |
131 : | parrello | 1.2 | =item Annotation |
132 : | |||
133 : | Loads B<SproutUser>, B<UserAccess>, B<Annotation>, B<IsTargetOfAnnotation>, | ||
134 : | B<MadeAnnotation>. | ||
135 : | |||
136 : | =item Property | ||
137 : | |||
138 : | Loads B<Property>, B<HasProperty>. | ||
139 : | |||
140 : | =item BBH | ||
141 : | |||
142 : | Loads B<IsBidirectionalBestHitOf>. | ||
143 : | |||
144 : | parrello | 1.3 | =item Group |
145 : | |||
146 : | Loads B<GenomeGroups>. | ||
147 : | |||
148 : | =item Source | ||
149 : | |||
150 : | Loads B<Source>, B<ComesFrom>, B<SourceURL>. | ||
151 : | |||
152 : | parrello | 1.4 | =item External |
153 : | |||
154 : | Loads B<ExternalAliasOrg>, B<ExternalAliasFunc>. | ||
155 : | |||
156 : | parrello | 1.8 | =item Reaction |
157 : | |||
158 : | Loads B<ReactionURL>, B<Compound>, B<CompoundName>, | ||
159 : | parrello | 1.11 | B<CompoundCAS>, B<IsAComponentOf>, B<Reaction>. |
160 : | parrello | 1.8 | |
161 : | parrello | 1.3 | =item * |
162 : | |||
163 : | Loads all of the above tables. | ||
164 : | |||
165 : | parrello | 1.1 | =back |
166 : | |||
167 : | parrello | 1.7 | The command-line options are given below. |
168 : | parrello | 1.1 | |
169 : | =over 4 | ||
170 : | |||
171 : | =item geneFile | ||
172 : | |||
173 : | The name of the file containing the genomes and their associated access codes. The | ||
174 : | file should have one line per genome, each line consisting of the genome ID followed | ||
175 : | by the access code, separated by a tab. If no file is specified, all complete genomes | ||
176 : | will be processed and the access code will be 1. | ||
177 : | |||
178 : | =item subsysFile | ||
179 : | |||
180 : | The name of the file containing the trusted subsystems. The file should have one line | ||
181 : | per trusted subsystem. If no file is specified, all subsystems will be trusted. | ||
182 : | |||
183 : | =item trace | ||
184 : | |||
185 : | Desired tracing level. The default is 3. | ||
186 : | |||
187 : | parrello | 1.7 | =item limitedFeatures |
188 : | |||
189 : | Only generate the B<Feature> and B<IsLocatedIn> tables when processing the feature group. | ||
190 : | |||
191 : | parrello | 1.10 | =item dbLoad |
192 : | |||
193 : | If TRUE, the database tables will be loaded automatically from the load files created. | ||
194 : | |||
195 : | parrello | 1.14 | =item dbCreate |
196 : | parrello | 1.1 | |
197 : | parrello | 1.14 | If TRUE, the database will be created. If the database exists already, it will be |
198 : | dropped. Use the function with caution. | ||
199 : | parrello | 1.12 | |
200 : | parrello | 1.14 | =back |
201 : | parrello | 1.12 | |
202 : | parrello | 1.1 | =cut |
203 : | |||
204 : | use strict; | ||
205 : | use Tracer; | ||
206 : | use DocUtils; | ||
207 : | use Cwd; | ||
208 : | use FIG; | ||
209 : | use SFXlate; | ||
210 : | use File::Copy; | ||
211 : | use File::Path; | ||
212 : | use SproutLoad; | ||
213 : | use Stats; | ||
214 : | parrello | 1.9 | use SFXlate; |
215 : | parrello | 1.1 | |
216 : | # Get the command-line parameters and options. | ||
217 : | my ($options, @parameters) = Tracer::ParseCommand({ geneFile => "", subsysFile => "", | ||
218 : | parrello | 1.10 | trace => 3, limitedFeatures => 0, |
219 : | parrello | 1.14 | dbLoad => 0, dbCreate => 0 }, @ARGV); |
220 : | parrello | 1.1 | # Set up tracing. |
221 : | parrello | 1.4 | TSetup("$options->{trace} SproutLoad ERDBLoad ERDB Stats Tracer Load", "+>$FIG_Config::temp/trace.log"); |
222 : | parrello | 1.14 | if ($options->{dbCreate}) { |
223 : | # Here we want to drop and re-create the database. | ||
224 : | my $db = $FIG_Config::sproutDB; | ||
225 : | if ($FIG_Config::dbms eq "Pg") { | ||
226 : | my $dbport = $FIG_Config::dbport; | ||
227 : | my $dbuser = $FIG_Config::dbuser; | ||
228 : | system("dropdb -p $dbport -U $dbuser $db"); | ||
229 : | &FIG::run("createdb -p $dbport -U $dbuser $db"); | ||
230 : | } elsif ($FIG_Config::dbms eq "mysql") { | ||
231 : | system("mysqladmin -u $FIG_Config::dbuser -p drop $db"); | ||
232 : | &FIG::run("mysqladmin -u $FIG_Config::dbuser -p create $db"); | ||
233 : | } | ||
234 : | |||
235 : | } | ||
236 : | parrello | 1.9 | # Create the sprout loader object. Note that the Sprout object does not |
237 : | parrello | 1.10 | # open the database unless the "dbLoad" option is turned on. |
238 : | parrello | 1.1 | my $fig = FIG->new(); |
239 : | parrello | 1.10 | my $sprout = SFXlate->new_sprout_only(undef, undef, undef, ! $options->{dbLoad}); |
240 : | parrello | 1.7 | my $spl = SproutLoad->new($sprout, $fig, $options->{geneFile}, $options->{subsysFile}, $options); |
241 : | parrello | 1.1 | # Process the parameters. |
242 : | for my $group (@parameters) { | ||
243 : | Trace("Processing load group $group.") if T(2); | ||
244 : | my $stats; | ||
245 : | parrello | 1.3 | if ($group eq 'Genome' || $group eq '*') { |
246 : | parrello | 1.1 | $spl->LoadGenomeData(); |
247 : | parrello | 1.3 | } |
248 : | if ($group eq 'Feature' || $group eq '*') { | ||
249 : | parrello | 1.1 | $spl->LoadFeatureData(); |
250 : | parrello | 1.3 | } |
251 : | if ($group eq 'Coupling' || $group eq '*') { | ||
252 : | parrello | 1.1 | $spl->LoadCouplingData(); |
253 : | parrello | 1.3 | } |
254 : | if ($group eq 'Subsystem' || $group eq '*') { | ||
255 : | parrello | 1.1 | $spl->LoadSubsystemData(); |
256 : | parrello | 1.3 | } |
257 : | if ($group eq 'Property' || $group eq '*') { | ||
258 : | parrello | 1.1 | $spl->LoadPropertyData(); |
259 : | parrello | 1.3 | } |
260 : | if ($group eq 'Annotation' || $group eq '*') { | ||
261 : | parrello | 1.2 | $spl->LoadAnnotationData(); |
262 : | parrello | 1.3 | } |
263 : | if ($group eq 'BBH' || $group eq '*') { | ||
264 : | parrello | 1.2 | $spl->LoadBBHData(); |
265 : | parrello | 1.1 | } |
266 : | parrello | 1.4 | if ($group eq 'Group' || $group eq '*') { |
267 : | parrello | 1.3 | $spl->LoadGroupData(); |
268 : | } | ||
269 : | if ($group eq 'Source' || $group eq '*') { | ||
270 : | $spl->LoadSourceData(); | ||
271 : | } | ||
272 : | parrello | 1.4 | if ($group eq 'External' || $group eq '*') { |
273 : | $spl->LoadExternalData(); | ||
274 : | } | ||
275 : | parrello | 1.8 | if ($group eq 'Reaction' || $group eq '*') { |
276 : | $spl->LoadReactionData(); | ||
277 : | } | ||
278 : | parrello | 1.3 | |
279 : | parrello | 1.1 | } |
280 : | Trace("Load complete.") if T(2); | ||
281 : | |||
282 : | 1; |
MCS Webmaster | ViewVC Help |
Powered by ViewVC 1.0.3 |