48 |
C<LoadSproutTables> takes a long time to run, so setting the trace level to 3 helps |
C<LoadSproutTables> takes a long time to run, so setting the trace level to 3 helps |
49 |
to give you an idea of the progress. |
to give you an idea of the progress. |
50 |
|
|
|
Once the Sprout database is loaded, B<TestSproutLoad> can be used to verify the load |
|
|
against the FIG data. The end of the trace log file will contain statistics on |
|
|
the errors found. Like C<LoadSproutTables>, C<TestSproutLoad> is a time-consuming |
|
|
script, so you may want to set the trace level to 3 to see visible progress. |
|
|
|
|
|
TestSproutLoad -trace=3 [genomeID] ... |
|
|
|
|
|
The I<[genomeID]> specifies zero or more IDs of genomes to receive more thorough |
|
|
testing. So, for example, |
|
|
|
|
|
TestSproutLoad -trace=3 100226.1 83333.1 |
|
|
|
|
|
would do thorough testing of I<Streptomyces coelicolor A3-2> (100226.1) and |
|
|
I<Escherichia coli K12> (83333.1). |
|
|
|
|
|
Unlike C<LoadSproutTables>, in C<TestSproutLoad>, the individual errors found are |
|
|
mixed in with the trace messages. They are all, however, marked with a trace type |
|
|
of B<Problem>, as shown in the fragment below. |
|
|
|
|
|
11/02/2005 19:15:16 <main>: Processing feature fig|100226.1.peg.7742. |
|
|
11/02/2005 19:15:17 <main>: Processing feature fig|100226.1.peg.7741. |
|
|
11/02/2005 19:15:17 <Problem>: assignment "Short-chain dehydrodenase ... |
|
|
11/02/2005 19:15:17 <Problem>: assignment "putative oxidoreductase." ... |
|
|
11/02/2005 19:15:17 <Problem>: Incorrect assignment for fig|100226.1.peg.7741... |
|
|
11/02/2005 19:15:17 <Problem>: Incorrect number of annotations found in ... |
|
|
11/02/2005 19:15:17 <main>: Processing feature fig|100226.1.peg.7740. |
|
|
11/02/2005 19:15:18 <main>: Processing feature fig|100226.1.peg.7739. |
|
|
|
|
|
The test may reveal that some tables need to be reloaded, or that a software |
|
|
problem has crept into the Sprout. |
|
|
|
|
|
Once all the tables have the correct data, C<index_sprout_lucene> can be run to create the |
|
|
Lucene search indexes. Lucene is a web site search engine produced by the Apache project. |
|
|
It is written in Java, and in order to run it you must have the B<LuceneSearch> and |
|
|
B<NmpdrConfigs> projects checked out from CVS and made. |
|
|
|
|
51 |
=head2 The NMPDR Web Site |
=head2 The NMPDR Web Site |
52 |
|
|
53 |
Sprout is the database engine for the NMPDR web site. The NMPDR web site consists of two |
Sprout is the database engine for the NMPDR web site. The NMPDR web site consists of two |
106 |
|
|
107 |
=over 4 |
=over 4 |
108 |
|
|
109 |
The procedure for loading Sprout is as follows. |
Most of the above preparation is performed by the B<NMPDRSetup> utility. |
110 |
|
NMPDRSetup prints the instructions for completing the process, including |
111 |
|
loading the Sprout database. The specific procedure for loading |
112 |
|
the Sprout data, however, is as follows. |
113 |
|
|
114 |
=item 1 |
=item 1 |
115 |
|
|
116 |
Type |
Type |
117 |
|
|
118 |
nohup LoadSproutTables -dbLoad -dbCreate -user=you -background "*" >null & |
nohup LoadSproutTables -dbLoad -user=you -background "*" >null & |
119 |
|
|
120 |
where C<you> is your user ID, and press ENTER. This will create the C<dtx> files |
where C<you> is your user ID, and press ENTER. |
|
and load them. You may be asked for a password. If this is the case, simply |
|
|
press ENTER. If that does not work, use the C<dbpass> value specified in |
|
|
your C<FIG_Config.pm> file. |
|
121 |
|
|
122 |
The above command line runs the load in the background. The standard output, |
The above command line runs the load in the background. The standard output, |
123 |
standard error, and trace output will be directed to files in the FIG temporary |
standard error, and trace output will be directed to files in the FIG temporary |
134 |
|
|
135 |
Type |
Type |
136 |
|
|
|
nohup TestSproutLoad -user=you -background >null &100226.1 83333.1> |
|
|
|
|
|
and press ENTER. This will validate the Sprout database against the SEED data. |
|
|
|
|
|
=item 3 |
|
|
|
|
|
If any errors are detected in step (2), it is most likely due to a change in |
|
|
SEED that did not make it to Sprout. Contact Bruce Parrello or Robert Olson |
|
|
to get the code updated properly. |
|
|
|
|
|
=item 4 |
|
|
|
|
|
Type |
|
|
|
|
137 |
index_sprout_lucene |
index_sprout_lucene |
138 |
|
|
139 |
and press ENTER. This will create the Lucene indexes for the Sprout data. |
and press ENTER. This will create the Lucene indexes for the Sprout data. |
140 |
|
|
|
=item 5 |
|
|
|
|
|
Change to the B<SproutData/Indexes> directory under B<FIGdisk> and look for the |
|
|
directory created by C<index_sprout_lucene>. The directory name will be |
|
|
something like C<Lucene.20060412-154112>. The numbers indicate the date and time |
|
|
the index was created. In this case it was 04/12/2006 03:41:12pm. Type |
|
|
|
|
|
ln -sf directory Lucene |
|
|
|
|
|
where C<directory> is the new directory name, to point the C<Lucene> directory to the |
|
|
new search index. |
|
|
|
|
141 |
=back |
=back |
142 |
|
|
143 |
=head2 LoadSproutTables Command |
=head2 LoadSproutTables Command |
155 |
=item Feature |
=item Feature |
156 |
|
|
157 |
Loads B<Feature>, B<FeatureAlias>, B<FeatureTranslation>, B<FeatureUpstream>, |
Loads B<Feature>, B<FeatureAlias>, B<FeatureTranslation>, B<FeatureUpstream>, |
158 |
B<IsLocatedIn>, B<FeatureLink>. |
B<IsLocatedIn>, B<FeatureLink>, B<IsAliasOf>, B<CDD>, B<HasFeature>, |
159 |
|
B<HasRoleInSubsystem>, B<FeatureEssential>, B<FeatureVirulent>, B<FeatureIEDB>, |
160 |
=item Coupling |
B<CDD>, and B<IsPresentOnProteinOf> |
|
|
|
|
Loads B<Coupling>, B<IsEvidencedBy>, B<PCH>, B<ParticipatesInCoupling>, |
|
|
B<UsesAsEvidence>. |
|
161 |
|
|
162 |
=item Subsystem |
=item Subsystem |
163 |
|
|
165 |
B<IsRoleOf>, B<OccursInSubsystem>, B<ParticipatesIn>, B<HasSSCell>, |
B<IsRoleOf>, B<OccursInSubsystem>, B<ParticipatesIn>, B<HasSSCell>, |
166 |
B<Catalyzes>, B<ConsistsOfRoles>, B<RoleSubset>, B<HasRoleSubset>, |
B<Catalyzes>, B<ConsistsOfRoles>, B<RoleSubset>, B<HasRoleSubset>, |
167 |
B<ConsistsOfGenomes>, B<GenomeSubset>, B<HasGenomeSubset>, B<Diagram>, |
B<ConsistsOfGenomes>, B<GenomeSubset>, B<HasGenomeSubset>, B<Diagram>, |
168 |
B<RoleOccursIn>. |
B<RoleOccursIn>, B<SubSystemClass>, B<RoleEC>, B<IsIdentifiedByEC>, |
169 |
|
B<ContainsFeature>. |
170 |
|
|
171 |
=item Annotation |
=item Annotation |
172 |
|
|
177 |
|
|
178 |
Loads B<Property>, B<HasProperty>. |
Loads B<Property>, B<HasProperty>. |
179 |
|
|
|
=item BBH |
|
|
|
|
|
Loads B<IsBidirectionalBestHitOf>. |
|
|
|
|
180 |
=item Group |
=item Group |
181 |
|
|
182 |
Loads B<GenomeGroups>. |
Loads B<GenomeGroups>. |
198 |
|
|
199 |
Loads B<SynonymGroup> and B<IsSynonymGroupFor>. |
Loads B<SynonymGroup> and B<IsSynonymGroupFor>. |
200 |
|
|
201 |
|
=item Family |
202 |
|
|
203 |
|
Loads B<Family> and B<IsFamilyForFeature>. |
204 |
|
|
205 |
|
=item Drug |
206 |
|
|
207 |
|
Loads B<PDB>, B<DocksWith>, C<IsProteinForFeature>, and C<Ligand>. |
208 |
|
|
209 |
=item * |
=item * |
210 |
|
|
211 |
Loads all of the above tables. |
Loads all of the above tables. |
221 |
The name of the file containing the genomes and their associated access codes. The |
The name of the file containing the genomes and their associated access codes. The |
222 |
file should have one line per genome, each line consisting of the genome ID followed |
file should have one line per genome, each line consisting of the genome ID followed |
223 |
by the access code, separated by a tab. If no file is specified, all complete genomes |
by the access code, separated by a tab. If no file is specified, all complete genomes |
224 |
will be processed and the access code will be 1. |
will be processed and the access code will be 1. Specify C<default> to use the |
225 |
|
default gene file-- C<genes.tbl> in the C<SproutData> directory. |
226 |
|
|
227 |
=item subsysFile |
=item subsysFile |
228 |
|
|
235 |
|
|
236 |
=item user |
=item user |
237 |
|
|
238 |
Suffix to use for trace, output, and error files created in |
Suffix to use for trace, output, and error files created. |
239 |
|
|
240 |
=item dbLoad |
=item dbLoad |
241 |
|
|
252 |
will not be created. This option is useful if you are setting up a copy of Sprout |
will not be created. This option is useful if you are setting up a copy of Sprout |
253 |
and have load files already set up from the original version. |
and have load files already set up from the original version. |
254 |
|
|
|
=item primaryOnly |
|
|
|
|
|
If TRUE, only the group's primary entity will be loaded. |
|
|
|
|
255 |
=item background |
=item background |
256 |
|
|
257 |
Redirect the standard and error output to files in the FIG temporary directory. |
Redirect the standard and error output to files in the FIG temporary directory. |
285 |
use Stats; |
use Stats; |
286 |
use SFXlate; |
use SFXlate; |
287 |
|
|
288 |
|
# This is a list of the load groups in their natural order. We'll go through these in sequence, processing |
289 |
|
# the ones the user asks for. |
290 |
|
my @LoadGroups = qw(Genome Feature Subsystem Property Annotation Source External Reaction Synonym Family Drug); |
291 |
|
|
292 |
# Get the command-line parameters and options. |
# Get the command-line parameters and options. |
293 |
my ($options, @parameters) = StandardSetup(['SproutLoad', 'ERDBLoad', 'Stats', |
my ($options, @parameters) = StandardSetup(['SproutLoad', 'ERDBLoad', 'Stats', |
294 |
'ERDB', 'Load', 'Sprout', 'Subsystem'], |
'ERDB', 'Load', 'Sprout', 'Subsystem'], |
297 |
dbLoad => [0, "load the database from generated files"], |
dbLoad => [0, "load the database from generated files"], |
298 |
dbCreate => [0, "drop and re-create the database"], |
dbCreate => [0, "drop and re-create the database"], |
299 |
loadOnly => [0, "load the database from previously generated files"], |
loadOnly => [0, "load the database from previously generated files"], |
|
primaryOnly => [0, "only process the group's main entity"], |
|
300 |
resume => [0, "resume a complete load starting with the first group specified in the parameter list"], |
resume => [0, "resume a complete load starting with the first group specified in the parameter list"], |
301 |
phone => ["", "phone number (international format) to call when load finishes"], |
phone => ["", "phone number (international format) to call when load finishes"], |
302 |
}, |
}, |
311 |
my $db = $FIG_Config::sproutDB; |
my $db = $FIG_Config::sproutDB; |
312 |
DBKernel::CreateDB($db); |
DBKernel::CreateDB($db); |
313 |
} |
} |
314 |
|
# Compute the gene file name. |
315 |
|
my $geneFile = $options->{geneFile}; |
316 |
|
if ($geneFile eq 'default') { |
317 |
|
$geneFile = "$FIG_Config::sproutData/genes.tbl"; |
318 |
|
} |
319 |
# Create the sprout loader object. Note that the Sprout object does not |
# Create the sprout loader object. Note that the Sprout object does not |
320 |
# open the database unless the "dbLoad" option is turned on. |
# open the database unless the "dbLoad" option is turned on. |
321 |
my $fig = FIG->new(); |
my $fig = FIG->new(); |
322 |
my $sprout = SFXlate->new_sprout_only(undef, undef, undef, ! $options->{dbLoad}); |
my $sprout = SFXlate->new_sprout_only(undef, undef, undef, ! $options->{dbLoad}); |
323 |
my $spl = SproutLoad->new($sprout, $fig, $options->{geneFile}, $options->{subsysFile}, $options); |
my $spl = SproutLoad->new($sprout, $fig, $geneFile, $options->{subsysFile}, $options); |
324 |
# Insure we have an output directory. |
# Insure we have an output directory. |
325 |
FIG::verify_dir($FIG_Config::sproutData); |
FIG::verify_dir($FIG_Config::sproutData); |
326 |
|
# Check for the "*" option. |
327 |
|
if ($parameters[0] eq '*') { |
328 |
|
@parameters = @LoadGroups; |
329 |
|
} |
330 |
# If we're resuming, we only want to have 1 parameter. |
# If we're resuming, we only want to have 1 parameter. |
331 |
my $resume = $options->{resume}; |
my $resume = $options->{resume}; |
332 |
if ($resume && @parameters > 1) { |
if ($resume && @parameters > 1) { |
333 |
Confess("If resume=1, only one load group can be specified."); |
Confess("If resume=1, only one load group can be specified."); |
334 |
} elsif (! @parameters) { |
} elsif (! @parameters) { |
335 |
Confess("No load groups were specified."); |
Trace("No load groups were specified.") if T(0); |
336 |
|
} |
337 |
|
# Process the resume option here. We modify the incoming parameters to |
338 |
|
# contain the resume group and everything after it. |
339 |
|
if ($resume) { |
340 |
|
# Save the starting group. |
341 |
|
my $resumeGroup = $parameters[0]; |
342 |
|
# Copy the load group list into the parameter array. |
343 |
|
@parameters = @LoadGroups; |
344 |
|
# Shift out the groups until we reach our desired starting point. |
345 |
|
while (scalar(@parameters) && $parameters[0] ne $resumeGroup) { |
346 |
|
shift @parameters; |
347 |
|
} |
348 |
|
if (! @parameters) { |
349 |
|
Confess("Resume group \"$resumeGroup\" not found."); |
350 |
|
} |
351 |
} |
} |
352 |
# Set a variable to contain return type information. |
# Set a variable to contain return type information. |
353 |
my $rtype; |
my $rtype; |
354 |
|
# Set up a statistics object for statistics about the entire load. |
355 |
|
my $totalStats = Stats->new(); |
356 |
# Insure we catch errors. |
# Insure we catch errors. |
357 |
eval { |
eval { |
358 |
# Process the parameters. |
# Process the parameters. |
359 |
for my $group (@parameters) { |
for my $group (@parameters) { |
360 |
Trace("Processing load group $group.") if T(2); |
Trace("Processing load group $group.") if T(2); |
361 |
my $stats; |
# Compute the string we want to execute. |
362 |
if ($group eq 'Genome' || $group eq '*') { |
my $code = "\$spl->Load${group}Data()"; |
363 |
$spl->LoadGenomeData(); |
# Load this group. |
364 |
$group = ResumeCheck($resume, $group); |
my $stats = eval($code); |
365 |
} |
if ($@) { |
366 |
if ($group eq 'Feature' || $group eq '*') { |
Confess("Load group error: $@"); |
367 |
$spl->LoadFeatureData(); |
} |
368 |
$group = ResumeCheck($resume, $group); |
# Merge the statistics into the master. |
369 |
} |
$totalStats->Accumulate($stats); |
|
if ($group eq 'Coupling' || $group eq '*') { |
|
|
$spl->LoadCouplingData(); |
|
|
$group = ResumeCheck($resume, $group); |
|
|
} |
|
|
if ($group eq 'Subsystem' || $group eq '*') { |
|
|
$spl->LoadSubsystemData(); |
|
|
$group = ResumeCheck($resume, $group); |
|
|
} |
|
|
if ($group eq 'Property' || $group eq '*') { |
|
|
$spl->LoadPropertyData(); |
|
|
$group = ResumeCheck($resume, $group); |
|
|
} |
|
|
if ($group eq 'Annotation' || $group eq '*') { |
|
|
$spl->LoadAnnotationData(); |
|
|
$group = ResumeCheck($resume, $group); |
|
|
} |
|
|
if ($group eq 'BBH' || $group eq '*') { |
|
|
$spl->LoadBBHData(); |
|
|
$group = ResumeCheck($resume, $group); |
|
|
} |
|
|
if ($group eq 'Group' || $group eq '*') { |
|
|
$spl->LoadGroupData(); |
|
|
$group = ResumeCheck($resume, $group); |
|
|
} |
|
|
if ($group eq 'Source' || $group eq '*') { |
|
|
$spl->LoadSourceData(); |
|
|
$group = ResumeCheck($resume, $group); |
|
|
} |
|
|
if ($group eq 'External' || $group eq '*') { |
|
|
$spl->LoadExternalData(); |
|
|
$group = ResumeCheck($resume, $group); |
|
|
} |
|
|
if ($group eq 'Reaction' || $group eq '*') { |
|
|
$spl->LoadReactionData(); |
|
|
$group = ResumeCheck($resume, $group); |
|
|
} |
|
|
if ($group eq 'Synonym' || $group eq '*') { |
|
|
$spl->LoadSynonymData(); |
|
|
$group = ResumeCheck($resume, $group); |
|
370 |
} |
} |
371 |
|
# Compute the statistical display. |
372 |
|
my $statDisplay = $totalStats->Show(); |
373 |
|
# Display it. |
374 |
|
Trace("Statistics for this load:\n$statDisplay") if T(2); |
375 |
|
# Check for a "table load failed" message. If we find one, we want |
376 |
|
# to end with an error. |
377 |
|
if ($statDisplay =~ /table load failed/i) { |
378 |
|
Confess("One or more table loads failed."); |
379 |
} |
} |
380 |
}; |
}; |
381 |
if ($@) { |
if ($@) { |
385 |
Trace("Load complete.") if T(2); |
Trace("Load complete.") if T(2); |
386 |
$rtype = "no error"; |
$rtype = "no error"; |
387 |
} |
} |
388 |
if ($phone) { |
if ($options->{phone}) { |
389 |
my $msgID = Tracer::SendSMS($options->{phone}, "Sprout load terminated with $rtype."); |
my $msgID = Tracer::SendSMS($options->{phone}, "Sprout load terminated with $rtype."); |
390 |
if ($msgID) { |
if ($msgID) { |
391 |
Trace("Phone message sent with ID $msgID.") if T(2); |
Trace("Phone message sent with ID $msgID.") if T(2); |
393 |
Trace("Phone message not sent.") if T(2); |
Trace("Phone message not sent.") if T(2); |
394 |
} |
} |
395 |
} |
} |
|
# If the resume flag is set, return "*", else return "". |
|
|
sub ResumeCheck { |
|
|
my ($resume, $group) = @_; |
|
|
return ($resume ? "*" : $group); |
|
|
} |
|
396 |
|
|
397 |
1; |
1; |