20 |
commands. |
commands. |
21 |
|
|
22 |
LoadSproutTables -dbLoad -dbCreate "*" |
LoadSproutTables -dbLoad -dbCreate "*" |
23 |
TestSproutLoad |
TestSproutLoad [genomeID] ... |
24 |
index_sprout |
index_sprout_lucene |
25 |
|
|
26 |
|
where I<[genomeID]> is one or more genome IDs. These genomes will be tested more |
27 |
|
thoroughly than the others. |
28 |
|
|
29 |
All three commands send output to the console. In addition, C<LoadSproutTables> and |
All three commands send output to the console. In addition, C<LoadSproutTables> and |
30 |
C<TestSproutLoad> write tracing information to C<trace.log> in the FIG temporary |
C<TestSproutLoad> write tracing information to a trace log in the FIG temporary |
31 |
directory (B<$FIG_Config::Tmp>). At the bottom of the log file will be a complete |
directory (B<$FIG_Config::Tmp>). At the bottom of the log file will be a complete |
32 |
list of errors. If errors occur in C<LoadSproutTables>, then the data must be corrected |
list of errors. If errors occur in C<LoadSproutTables>, then the data must be corrected |
33 |
and the offending table group reloaded. So, for example, if there are errors in the |
and the offending table group reloaded. So, for example, if there are errors in the |
49 |
to give you an idea of the progress. |
to give you an idea of the progress. |
50 |
|
|
51 |
Once the Sprout database is loaded, B<TestSproutLoad> can be used to verify the load |
Once the Sprout database is loaded, B<TestSproutLoad> can be used to verify the load |
52 |
against the FIG data. Again, the end of the C<trace.log> file will contain a summary |
against the FIG data. The end of the trace log file will contain statistics on |
53 |
of the errors found. Like C<LoadSproutTables>, C<TestSproutLoad> is a time-consuming |
the errors found. Like C<LoadSproutTables>, C<TestSproutLoad> is a time-consuming |
54 |
script, so you may want to set the trace level to 3 to see visible progress. |
script, so you may want to set the trace level to 3 to see visible progress. |
55 |
|
|
56 |
TestSproutLoad -trace=3 |
TestSproutLoad -trace=3 [genomeID] ... |
57 |
|
|
58 |
|
The I<[genomeID]> specifies zero or more IDs of genomes to receive more thorough |
59 |
|
testing. So, for example, |
60 |
|
|
61 |
|
TestSproutLoad -trace=3 100226.1 83333.1 |
62 |
|
|
63 |
|
would do thorough testing of I<Streptomyces coelicolor A3-2> (100226.1) and |
64 |
|
I<Escherichia coli K12> (83333.1). |
65 |
|
|
66 |
Unlike C<LoadSproutTables>, in C<TestSproutLoad>, the individual errors found are |
Unlike C<LoadSproutTables>, in C<TestSproutLoad>, the individual errors found are |
67 |
mixed in with the trace messages. They are all, however, marked with a trace type |
mixed in with the trace messages. They are all, however, marked with a trace type |
79 |
The test may reveal that some tables need to be reloaded, or that a software |
The test may reveal that some tables need to be reloaded, or that a software |
80 |
problem has crept into the Sprout. |
problem has crept into the Sprout. |
81 |
|
|
82 |
Once all the tables have the correct data, C<index_sprout> can be run to create the |
Once all the tables have the correct data, C<index_sprout_lucene> can be run to create the |
83 |
Glimpse indexes. |
Lucene search indexes. Lucene is a web site search engine produced by the Apache project. |
84 |
|
It is written in Java, and in order to run it you must have the B<LuceneSearch> and |
85 |
|
B<NmpdrConfigs> projects checked out from CVS and made. |
86 |
|
|
87 |
|
=head2 The NMPDR Web Site |
88 |
|
|
89 |
|
Sprout is the database engine for the NMPDR web site. The NMPDR web site consists of two |
90 |
|
pieces that run on two different machines. The B<WEB> machine contains HTML pages |
91 |
|
generated by a Content Management Tool. |
92 |
|
|
93 |
=head2 Procedure For Loading Sprout |
=head2 Procedure For Loading Sprout |
94 |
|
|
95 |
|
In order to load the Sprout, you need to have the B<Sprout>, B<NmpdrConfigs>, and |
96 |
|
B<LuceneSearch> projects checked out from CVS in addition to the standard FIG |
97 |
|
projects. You must also set up the following B<FIG_Config.pm> variables in addition |
98 |
|
to the normal ones. |
99 |
|
|
100 |
=over 4 |
=over 4 |
101 |
|
|
102 |
|
=item sproutData |
103 |
|
|
104 |
|
Name of the data directory for the Sprout load files. |
105 |
|
|
106 |
|
=item var |
107 |
|
|
108 |
|
Name of the directory to contain cached NMPDR pages. The most important file in |
109 |
|
this directory is C<nmpdr_page_template.html>, which contains a skeleton page |
110 |
|
from the main NMPDR web site. This skeleton page is used to generate output |
111 |
|
pages that look like the other NMPDR pages. |
112 |
|
|
113 |
|
=item java |
114 |
|
|
115 |
|
Path to the Java runtime environment. |
116 |
|
|
117 |
|
=item sproutDB |
118 |
|
|
119 |
|
Name of the Sprout database |
120 |
|
|
121 |
|
=item dbuser |
122 |
|
|
123 |
|
User name for logging into the Sprout database. |
124 |
|
|
125 |
|
=item dbpass |
126 |
|
|
127 |
|
Password for logging into the Sprout database. |
128 |
|
|
129 |
|
=item nmpdr_site_url |
130 |
|
|
131 |
|
URL for the NMPDR cover pages. The NMPDR cover pages are informational and text |
132 |
|
pages that serve as the entry point to the NMPDR web site. They are generated by |
133 |
|
a Content Management tool, and some Sprout scripts need to know where to find |
134 |
|
them. |
135 |
|
|
136 |
|
=item nmpdr_site_template_id |
137 |
|
|
138 |
|
Page number for the template page used to generate results that look like they're |
139 |
|
part of the NMPDR web site. |
140 |
|
|
141 |
|
=back |
142 |
|
|
143 |
|
=over 4 |
144 |
|
|
145 |
|
The procedure for loading Sprout is as follows. |
146 |
|
|
147 |
=item 1 |
=item 1 |
148 |
|
|
149 |
Type C<LoadSproutTables -dbLoad -dbCreate "*"> and press ENTER. This will create |
Type |
150 |
the C<dtx> files and load them. |
|
151 |
|
nohup LoadSproutTables -dbLoad -dbCreate -user=you -background "*" >null & |
152 |
|
|
153 |
|
where C<you> is your user ID, and press ENTER. This will create the C<dtx> files |
154 |
|
and load them. You may be asked for a password. If this is the case, simply |
155 |
|
press ENTER. If that does not work, use the C<dbpass> value specified in |
156 |
|
your C<FIG_Config.pm> file. |
157 |
|
|
158 |
|
The above command line runs the load in the background. The standard output, |
159 |
|
standard error, and trace output will be directed to files in the FIG temporary |
160 |
|
directory. If your user name is C<Bruce> then the files will be named |
161 |
|
C<outBruce.log>, C<errBruce.log>, and C<traceBruce.log> respectively. |
162 |
|
|
163 |
|
If the load fails at some point and you are able to correct the problem, use the |
164 |
|
C<resume> option to restart it. For example, if the load failed while doing the |
165 |
|
Feature load group, you would resume it using |
166 |
|
|
167 |
|
nohup LoadSproutTables -dbLoad -dbCreate -user=you -resume -background Feature >null & |
168 |
|
|
169 |
=item 2 |
=item 2 |
170 |
|
|
171 |
Type C<TestSproutLoad> and press ENTER. This will validate the Sprout database |
Type |
172 |
against the SEED data. |
|
173 |
|
nohup TestSproutLoad -user=you -background >null &100226.1 83333.1> |
174 |
|
|
175 |
|
and press ENTER. This will validate the Sprout database against the SEED data. |
176 |
|
|
177 |
=item 3 |
=item 3 |
178 |
|
|
182 |
|
|
183 |
=item 4 |
=item 4 |
184 |
|
|
185 |
Type C<index_sprout> and press ENTER. This will create the Glimpse indexes |
Type |
186 |
for the Sprout data. |
|
187 |
|
index_sprout_lucene |
188 |
|
|
189 |
|
and press ENTER. This will create the Lucene indexes for the Sprout data. |
190 |
|
|
191 |
|
=item 5 |
192 |
|
|
193 |
|
Change to the B<SproutData/Indexes> directory under B<FIGdisk> and look for the |
194 |
|
directory created by C<index_sprout_lucene>. The directory name will be |
195 |
|
something like C<Lucene.20060412-154112>. The numbers indicate the date and time |
196 |
|
the index was created. In this case it was 04/12/2006 03:41:12pm. Type |
197 |
|
|
198 |
|
ln -sf directory Lucene |
199 |
|
|
200 |
|
where C<directory> is the new directory name, to point the C<Lucene> directory to the |
201 |
|
new search index. |
202 |
|
|
203 |
=back |
=back |
204 |
|
|
214 |
|
|
215 |
Loads B<Genome>, B<HasContig>, B<Contig>, B<IsMadeUpOf>, and B<Sequence>. |
Loads B<Genome>, B<HasContig>, B<Contig>, B<IsMadeUpOf>, and B<Sequence>. |
216 |
|
|
|
=item Coupling |
|
|
|
|
|
Loads B<Coupling>, B<IsEvidencedBy>, B<PCH>, B<ParticipatesInCoupling>, |
|
|
B<UsesAsEvidence>. |
|
|
|
|
217 |
=item Feature |
=item Feature |
218 |
|
|
219 |
Loads B<Feature>, B<FeatureAlias>, B<FeatureTranslation>, B<FeatureUpstream>, |
Loads B<Feature>, B<FeatureAlias>, B<FeatureTranslation>, B<FeatureUpstream>, |
220 |
B<IsLocatedIn>, B<FeatureLink>. |
B<IsLocatedIn>, B<FeatureLink>. |
221 |
|
|
222 |
|
=item Coupling |
223 |
|
|
224 |
|
Loads B<Coupling>, B<IsEvidencedBy>, B<PCH>, B<ParticipatesInCoupling>, |
225 |
|
B<UsesAsEvidence>. |
226 |
|
|
227 |
=item Subsystem |
=item Subsystem |
228 |
|
|
229 |
Loads B<Subsystem>, B<Role>, B<SSCell>, B<ContainsFeature>, B<IsGenomeOf>, |
Loads B<Subsystem>, B<Role>, B<SSCell>, B<ContainsFeature>, B<IsGenomeOf>, |
288 |
|
|
289 |
Desired tracing level. The default is 3. |
Desired tracing level. The default is 3. |
290 |
|
|
291 |
=item limitedFeatures |
=item user |
292 |
|
|
293 |
Only generate the B<Feature> and B<IsLocatedIn> tables when processing the feature group. |
Suffix to use for trace, output, and error files created in |
294 |
|
|
295 |
=item dbLoad |
=item dbLoad |
296 |
|
|
301 |
If TRUE, the database will be created. If the database exists already, it will be |
If TRUE, the database will be created. If the database exists already, it will be |
302 |
dropped. Use the function with caution. |
dropped. Use the function with caution. |
303 |
|
|
304 |
|
=item loadOnly |
305 |
|
|
306 |
|
If TRUE, the database tables will be loaded from existing load files. Load files |
307 |
|
will not be created. This option is useful if you are setting up a copy of Sprout |
308 |
|
and have load files already set up from the original version. |
309 |
|
|
310 |
|
=item primaryOnly |
311 |
|
|
312 |
|
If TRUE, only the group's primary entity will be loaded. |
313 |
|
|
314 |
|
=item background |
315 |
|
|
316 |
|
Redirect the standard and error output to files in the FIG temporary directory. |
317 |
|
|
318 |
|
=item resume |
319 |
|
|
320 |
|
Resume an interrupted load, starting with the load group specified in the first |
321 |
|
positional parameter. |
322 |
|
|
323 |
|
=item sql |
324 |
|
|
325 |
|
Trace SQL statements. |
326 |
|
|
327 |
=back |
=back |
328 |
|
|
329 |
=cut |
=cut |
341 |
use SFXlate; |
use SFXlate; |
342 |
|
|
343 |
# Get the command-line parameters and options. |
# Get the command-line parameters and options. |
344 |
my ($options, @parameters) = Tracer::ParseCommand({ geneFile => "", subsysFile => "", |
my ($options, @parameters) = StandardSetup(['SproutLoad', 'ERDBLoad', 'Stats', |
345 |
trace => 3, limitedFeatures => 0, |
'ERDB', 'Load', 'Sprout', 'Subsystem'], |
346 |
dbLoad => 0, dbCreate => 0 }, @ARGV); |
{ geneFile => ["", "name of the genome list file"], |
347 |
# Set up tracing. |
subsysFile => ["", "name of the trusted subsystem file"], |
348 |
TSetup("$options->{trace} SproutLoad ERDBLoad ERDB Stats Tracer Load", "+>$FIG_Config::temp/trace.log"); |
dbLoad => [0, "load the database from generated files"], |
349 |
|
dbCreate => [0, "drop and re-create the database"], |
350 |
|
loadOnly => [0, "load the database from previously generated files"], |
351 |
|
primaryOnly => [0, "only process the group's main entity"], |
352 |
|
resume => [0, "resume a complete load starting with the first group specified in the parameter list"], |
353 |
|
}, |
354 |
|
"<group1> <group2> ...", |
355 |
|
@ARGV); |
356 |
|
# If we're doing a load-only, turn on loading. |
357 |
|
if ($options->{loadOnly}) { |
358 |
|
$options->{dbLoad} = 1 |
359 |
|
} |
360 |
if ($options->{dbCreate}) { |
if ($options->{dbCreate}) { |
361 |
# Here we want to drop and re-create the database. |
# Here we want to drop and re-create the database. |
362 |
my $db = $FIG_Config::sproutDB; |
my $db = $FIG_Config::sproutDB; |
363 |
if ($FIG_Config::dbms eq "Pg") { |
DBKernel::CreateDB($db); |
|
my $dbport = $FIG_Config::dbport; |
|
|
my $dbuser = $FIG_Config::dbuser; |
|
|
Trace("Dropping old database (failure is okay).") if T(2); |
|
|
system("dropdb -p $dbport -U $dbuser $db"); |
|
|
Trace("Dropping old database (failure is okay).") if T(2); |
|
|
&FIG::run("createdb -p $dbport -U $dbuser $db"); |
|
|
} elsif ($FIG_Config::dbms eq "mysql") { |
|
|
Trace("Dropping old database (failure is okay).") if T(2); |
|
|
system("mysqladmin -u $FIG_Config::dbuser -p drop $db"); |
|
|
&FIG::run("mysqladmin -u $FIG_Config::dbuser -p create $db"); |
|
|
Trace("Dropping old database (failure is okay).") if T(2); |
|
|
} |
|
|
|
|
364 |
} |
} |
365 |
# Create the sprout loader object. Note that the Sprout object does not |
# Create the sprout loader object. Note that the Sprout object does not |
366 |
# open the database unless the "dbLoad" option is turned on. |
# open the database unless the "dbLoad" option is turned on. |
369 |
my $spl = SproutLoad->new($sprout, $fig, $options->{geneFile}, $options->{subsysFile}, $options); |
my $spl = SproutLoad->new($sprout, $fig, $options->{geneFile}, $options->{subsysFile}, $options); |
370 |
# Insure we have an output directory. |
# Insure we have an output directory. |
371 |
FIG::verify_dir($FIG_Config::sproutData); |
FIG::verify_dir($FIG_Config::sproutData); |
372 |
|
# If we're resuming, we only want to have 1 parameter. |
373 |
|
my $resume = $options->{resume}; |
374 |
|
if ($resume && @parameters > 1) { |
375 |
|
Confess("If resume=1, only one load group can be specified."); |
376 |
|
} elsif (! @parameters) { |
377 |
|
Confess("No load groups were specified."); |
378 |
|
} |
379 |
# Process the parameters. |
# Process the parameters. |
380 |
for my $group (@parameters) { |
for my $group (@parameters) { |
381 |
Trace("Processing load group $group.") if T(2); |
Trace("Processing load group $group.") if T(2); |
382 |
my $stats; |
my $stats; |
383 |
if ($group eq 'Genome' || $group eq '*') { |
if ($group eq 'Genome' || $group eq '*') { |
384 |
$spl->LoadGenomeData(); |
$spl->LoadGenomeData(); |
385 |
|
$group = ResumeCheck($resume, $group); |
386 |
} |
} |
387 |
if ($group eq 'Feature' || $group eq '*') { |
if ($group eq 'Feature' || $group eq '*') { |
388 |
$spl->LoadFeatureData(); |
$spl->LoadFeatureData(); |
389 |
|
$group = ResumeCheck($resume, $group); |
390 |
} |
} |
391 |
if ($group eq 'Coupling' || $group eq '*') { |
if ($group eq 'Coupling' || $group eq '*') { |
392 |
$spl->LoadCouplingData(); |
$spl->LoadCouplingData(); |
393 |
|
$group = ResumeCheck($resume, $group); |
394 |
} |
} |
395 |
if ($group eq 'Subsystem' || $group eq '*') { |
if ($group eq 'Subsystem' || $group eq '*') { |
396 |
$spl->LoadSubsystemData(); |
$spl->LoadSubsystemData(); |
397 |
|
$group = ResumeCheck($resume, $group); |
398 |
} |
} |
399 |
if ($group eq 'Property' || $group eq '*') { |
if ($group eq 'Property' || $group eq '*') { |
400 |
$spl->LoadPropertyData(); |
$spl->LoadPropertyData(); |
401 |
|
$group = ResumeCheck($resume, $group); |
402 |
} |
} |
403 |
if ($group eq 'Annotation' || $group eq '*') { |
if ($group eq 'Annotation' || $group eq '*') { |
404 |
$spl->LoadAnnotationData(); |
$spl->LoadAnnotationData(); |
405 |
|
$group = ResumeCheck($resume, $group); |
406 |
} |
} |
407 |
if ($group eq 'BBH' || $group eq '*') { |
if ($group eq 'BBH' || $group eq '*') { |
408 |
$spl->LoadBBHData(); |
$spl->LoadBBHData(); |
409 |
|
$group = ResumeCheck($resume, $group); |
410 |
} |
} |
411 |
if ($group eq 'Group' || $group eq '*') { |
if ($group eq 'Group' || $group eq '*') { |
412 |
$spl->LoadGroupData(); |
$spl->LoadGroupData(); |
413 |
|
$group = ResumeCheck($resume, $group); |
414 |
} |
} |
415 |
if ($group eq 'Source' || $group eq '*') { |
if ($group eq 'Source' || $group eq '*') { |
416 |
$spl->LoadSourceData(); |
$spl->LoadSourceData(); |
417 |
|
$group = ResumeCheck($resume, $group); |
418 |
} |
} |
419 |
if ($group eq 'External' || $group eq '*') { |
if ($group eq 'External' || $group eq '*') { |
420 |
$spl->LoadExternalData(); |
$spl->LoadExternalData(); |
421 |
|
$group = ResumeCheck($resume, $group); |
422 |
} |
} |
423 |
if ($group eq 'Reaction' || $group eq '*') { |
if ($group eq 'Reaction' || $group eq '*') { |
424 |
$spl->LoadReactionData(); |
$spl->LoadReactionData(); |
425 |
|
$group = ResumeCheck($resume, $group); |
426 |
} |
} |
427 |
|
|
428 |
} |
} |
429 |
Trace("Load complete.") if T(2); |
Trace("Load complete.") if T(2); |
430 |
|
|
431 |
|
# If the resume flag is set, return "*", else return "". |
432 |
|
sub ResumeCheck { |
433 |
|
my ($resume, $group) = @_; |
434 |
|
return ($resume ? "*" : $group); |
435 |
|
} |
436 |
|
|
437 |
1; |
1; |