20 |
commands. |
commands. |
21 |
|
|
22 |
LoadSproutTables -dbLoad -dbCreate "*" |
LoadSproutTables -dbLoad -dbCreate "*" |
23 |
TestSproutLoad |
TestSproutLoad [genomeID] ... |
24 |
index_sprout |
index_sprout_lucene |
25 |
|
|
26 |
|
where I<[genomeID]> is one or more genome IDs. These genomes will be tested more |
27 |
|
thoroughly than the others. |
28 |
|
|
29 |
All three commands send output to the console. In addition, C<LoadSproutTables> and |
All three commands send output to the console. In addition, C<LoadSproutTables> and |
30 |
C<TestSproutLoad> write tracing information to C<trace.log> in the FIG temporary |
C<TestSproutLoad> write tracing information to a trace log in the FIG temporary |
31 |
directory (B<$FIG_Config::Tmp>). At the bottom of the log file will be a complete |
directory (B<$FIG_Config::Tmp>). At the bottom of the log file will be a complete |
32 |
list of errors. If errors occur in C<LoadSproutTables>, then the data must be corrected |
list of errors. If errors occur in C<LoadSproutTables>, then the data must be corrected |
33 |
and the offending table group reloaded. So, for example, if there are errors in the |
and the offending table group reloaded. So, for example, if there are errors in the |
48 |
C<LoadSproutTables> takes a long time to run, so setting the trace level to 3 helps |
C<LoadSproutTables> takes a long time to run, so setting the trace level to 3 helps |
49 |
to give you an idea of the progress. |
to give you an idea of the progress. |
50 |
|
|
51 |
Once the Sprout database is loaded, B<TestSproutLoad> can be used to verify the load |
=head2 The NMPDR Web Site |
|
against the FIG data. Again, the end of the C<trace.log> file will contain a summary |
|
|
of the errors found. Like C<LoadSproutTables>, C<TestSproutLoad> is a time-consuming |
|
|
script, so you may want to set the trace level to 3 to see visible progress. |
|
|
|
|
|
TestSproutLoad -trace=3 |
|
|
|
|
|
Unlike C<LoadSproutTables>, in C<TestSproutLoad>, the individual errors found are |
|
|
mixed in with the trace messages. They are all, however, marked with a trace type |
|
|
of B<Problem>, as shown in the fragment below. |
|
|
|
|
|
11/02/2005 19:15:16 <main>: Processing feature fig|100226.1.peg.7742. |
|
|
11/02/2005 19:15:17 <main>: Processing feature fig|100226.1.peg.7741. |
|
|
11/02/2005 19:15:17 <Problem>: assignment "Short-chain dehydrodenase ... |
|
|
11/02/2005 19:15:17 <Problem>: assignment "putative oxidoreductase." ... |
|
|
11/02/2005 19:15:17 <Problem>: Incorrect assignment for fig|100226.1.peg.7741... |
|
|
11/02/2005 19:15:17 <Problem>: Incorrect number of annotations found in ... |
|
|
11/02/2005 19:15:17 <main>: Processing feature fig|100226.1.peg.7740. |
|
|
11/02/2005 19:15:18 <main>: Processing feature fig|100226.1.peg.7739. |
|
|
|
|
|
The test may reveal that some tables need to be reloaded, or that a software |
|
|
problem has crept into the Sprout. |
|
52 |
|
|
53 |
Once all the tables have the correct data, C<index_sprout> can be run to create the |
Sprout is the database engine for the NMPDR web site. The NMPDR web site consists of two |
54 |
Glimpse indexes. |
pieces that run on two different machines. The B<WEB> machine contains HTML pages |
55 |
|
generated by a Content Management Tool. |
56 |
|
|
57 |
=head2 Procedure For Loading Sprout |
=head2 Procedure For Loading Sprout |
58 |
|
|
59 |
|
In order to load the Sprout, you need to have the B<Sprout>, B<NmpdrConfigs>, and |
60 |
|
B<LuceneSearch> projects checked out from CVS in addition to the standard FIG |
61 |
|
projects. You must also set up the following B<FIG_Config.pm> variables in addition |
62 |
|
to the normal ones. |
63 |
|
|
64 |
=over 4 |
=over 4 |
65 |
|
|
66 |
=item 1 |
=item sproutData |
67 |
|
|
68 |
|
Name of the data directory for the Sprout load files. |
69 |
|
|
70 |
|
=item var |
71 |
|
|
72 |
|
Name of the directory to contain cached NMPDR pages. The most important file in |
73 |
|
this directory is C<nmpdr_page_template.html>, which contains a skeleton page |
74 |
|
from the main NMPDR web site. This skeleton page is used to generate output |
75 |
|
pages that look like the other NMPDR pages. |
76 |
|
|
77 |
|
=item java |
78 |
|
|
79 |
Type C<LoadSproutTables -dbLoad -dbCreate "*"> and press ENTER. This will create |
Path to the Java runtime environment. |
|
the C<dtx> files and load them. |
|
80 |
|
|
81 |
=item 2 |
=item sproutDB |
82 |
|
|
83 |
Type C<TestSproutLoad> and press ENTER. This will validate the Sprout database |
Name of the Sprout database |
|
against the SEED data. |
|
84 |
|
|
85 |
=item 3 |
=item dbuser |
86 |
|
|
87 |
If any errors are detected in step (2), it is most likely due to a change in |
User name for logging into the Sprout database. |
|
SEED that did not make it to Sprout. Contact Bruce Parrello or Robert Olson |
|
|
to get the code updated properly. |
|
88 |
|
|
89 |
=item 4 |
=item dbpass |
90 |
|
|
91 |
Type C<index_sprout> and press ENTER. This will create the Glimpse indexes |
Password for logging into the Sprout database. |
92 |
for the Sprout data. |
|
93 |
|
=item nmpdr_site_url |
94 |
|
|
95 |
|
URL for the NMPDR cover pages. The NMPDR cover pages are informational and text |
96 |
|
pages that serve as the entry point to the NMPDR web site. They are generated by |
97 |
|
a Content Management tool, and some Sprout scripts need to know where to find |
98 |
|
them. |
99 |
|
|
100 |
|
=item nmpdr_site_template_id |
101 |
|
|
102 |
|
Page number for the template page used to generate results that look like they're |
103 |
|
part of the NMPDR web site. |
104 |
|
|
105 |
=back |
=back |
106 |
|
|
107 |
|
Most of the above preparation is performed by the B<NMPDRSetup> utility. |
108 |
|
NMPDRSetup prints the instructions for completing the process, including |
109 |
|
loading the Sprout database. The specific procedure for loading |
110 |
|
the Sprout data, however, is as follows. |
111 |
|
|
112 |
=head2 LoadSproutTables Command |
=head2 LoadSproutTables Command |
113 |
|
|
114 |
C<LoadSproutTables> creates the load files for Sprout tables and optionally loads them. |
C<LoadSproutTables> creates the load files for Sprout tables and optionally loads them. |
121 |
|
|
122 |
Loads B<Genome>, B<HasContig>, B<Contig>, B<IsMadeUpOf>, and B<Sequence>. |
Loads B<Genome>, B<HasContig>, B<Contig>, B<IsMadeUpOf>, and B<Sequence>. |
123 |
|
|
|
=item Coupling |
|
|
|
|
|
Loads B<Coupling>, B<IsEvidencedBy>, B<PCH>, B<ParticipatesInCoupling>, |
|
|
B<UsesAsEvidence>. |
|
|
|
|
124 |
=item Feature |
=item Feature |
125 |
|
|
126 |
Loads B<Feature>, B<FeatureAlias>, B<FeatureTranslation>, B<FeatureUpstream>, |
Loads B<Feature>, B<FeatureAlias>, B<FeatureTranslation>, B<FeatureUpstream>, |
127 |
B<IsLocatedIn>, B<FeatureLink>. |
B<IsLocatedIn>, B<FeatureLink>, B<IsAliasOf>, B<CDD>, B<HasFeature>, |
128 |
|
B<HasRoleInSubsystem>, B<FeatureEssential>, B<FeatureVirulent>, B<FeatureIEDB>, |
129 |
|
B<CDD>, and B<IsPresentOnProteinOf> |
130 |
|
|
131 |
=item Subsystem |
=item Subsystem |
132 |
|
|
133 |
Loads B<Subsystem>, B<Role>, B<SSCell>, B<ContainsFeature>, B<IsGenomeOf>, |
Loads B<Subsystem>, B<Role>, B<SSCell>, B<ContainsFeature>, B<IsGenomeOf>, |
134 |
B<IsRoleOf>, B<OccursInSubsystem>, B<ParticipatesIn>, B<HasSSCell>, |
B<IsRoleOf>, B<OccursInSubsystem>, B<ParticipatesIn>, B<HasSSCell>, |
135 |
B<Catalyzes>, B<ConsistsOfRoles>, B<RoleSubset>, B<HasRoleSubset>, |
B<ConsistsOfRoles>, B<RoleSubset>, B<HasRoleSubset>, |
136 |
B<ConsistsOfGenomes>, B<GenomeSubset>, B<HasGenomeSubset>, B<Diagram>, |
B<ConsistsOfGenomes>, B<GenomeSubset>, B<HasGenomeSubset>, B<Diagram>, |
137 |
B<RoleOccursIn>. |
B<RoleOccursIn>, B<SubSystemClass>, B<RoleEC>, B<IsIdentifiedByEC>, and |
138 |
|
B<ContainsFeature>. |
139 |
|
|
140 |
=item Annotation |
=item Annotation |
141 |
|
|
142 |
Loads B<SproutUser>, B<UserAccess>, B<Annotation>, B<IsTargetOfAnnotation>, |
Loads B<SproutUser>, B<UserAccess>, B<Annotation>, B<IsTargetOfAnnotation>, and |
143 |
B<MadeAnnotation>. |
B<MadeAnnotation>. |
144 |
|
|
145 |
=item Property |
=item Property |
146 |
|
|
147 |
Loads B<Property>, B<HasProperty>. |
Loads B<Property>, and B<HasProperty>. |
|
|
|
|
=item BBH |
|
|
|
|
|
Loads B<IsBidirectionalBestHitOf>. |
|
148 |
|
|
149 |
=item Group |
=item Group |
150 |
|
|
152 |
|
|
153 |
=item Source |
=item Source |
154 |
|
|
155 |
Loads B<Source>, B<ComesFrom>, B<SourceURL>. |
Loads B<Source>, B<ComesFrom>, and B<SourceURL>. |
|
|
|
|
=item External |
|
|
|
|
|
Loads B<ExternalAliasOrg>, B<ExternalAliasFunc>. |
|
156 |
|
|
157 |
=item Reaction |
=item Reaction |
158 |
|
|
159 |
Loads B<ReactionURL>, B<Compound>, B<CompoundName>, |
Loads B<ReactionURL>, B<Compound>, B<CompoundName>, |
160 |
B<CompoundCAS>, B<IsAComponentOf>, B<Reaction>. |
B<CompoundCAS>, B<IsAComponentOf>, B<Reaction>, B<Scenario>, B<IsInputFor>, |
161 |
|
B<IsOutputOf>, B<IsOnDiagram>, and B<Catalyzes>. |
162 |
|
|
163 |
|
=item Synonym |
164 |
|
|
165 |
|
Loads B<SynonymGroup> and B<IsSynonymGroupFor>. |
166 |
|
|
167 |
|
=item Family |
168 |
|
|
169 |
|
Loads B<Family> and B<IsFamilyForFeature>. |
170 |
|
|
171 |
|
=item Drug |
172 |
|
|
173 |
|
Loads B<PDB>, B<DocksWith>, C<IsProteinForFeature>, and C<Ligand>. |
174 |
|
|
175 |
=item * |
=item * |
176 |
|
|
187 |
The name of the file containing the genomes and their associated access codes. The |
The name of the file containing the genomes and their associated access codes. The |
188 |
file should have one line per genome, each line consisting of the genome ID followed |
file should have one line per genome, each line consisting of the genome ID followed |
189 |
by the access code, separated by a tab. If no file is specified, all complete genomes |
by the access code, separated by a tab. If no file is specified, all complete genomes |
190 |
will be processed and the access code will be 1. |
will be processed and the access code will be 1. Specify C<default> to use the |
191 |
|
default gene file-- C<genes.tbl> in the C<SproutData> directory. |
192 |
|
|
193 |
=item subsysFile |
=item subsysFile |
194 |
|
|
199 |
|
|
200 |
Desired tracing level. The default is 3. |
Desired tracing level. The default is 3. |
201 |
|
|
202 |
=item limitedFeatures |
=item user |
203 |
|
|
204 |
Only generate the B<Feature> and B<IsLocatedIn> tables when processing the feature group. |
Suffix to use for trace, output, and error files created. |
205 |
|
|
206 |
=item dbLoad |
=item dbLoad |
207 |
|
|
212 |
If TRUE, the database will be created. If the database exists already, it will be |
If TRUE, the database will be created. If the database exists already, it will be |
213 |
dropped. Use the function with caution. |
dropped. Use the function with caution. |
214 |
|
|
215 |
|
=item loadOnly |
216 |
|
|
217 |
|
If TRUE, the database tables will be loaded from existing load files. Load files |
218 |
|
will not be created. This option is useful if you are setting up a copy of Sprout |
219 |
|
and have load files already set up from the original version. |
220 |
|
|
221 |
|
=item background |
222 |
|
|
223 |
|
Redirect the standard and error output to files in the FIG temporary directory. |
224 |
|
|
225 |
|
=item resume |
226 |
|
|
227 |
|
Resume an interrupted load, starting with the load group specified in the first |
228 |
|
positional parameter. |
229 |
|
|
230 |
|
=item sql |
231 |
|
|
232 |
|
Trace SQL statements. |
233 |
|
|
234 |
|
=item phone |
235 |
|
|
236 |
|
Phone number to message when the load finishes. |
237 |
|
|
238 |
=back |
=back |
239 |
|
|
240 |
=cut |
=cut |
241 |
|
|
242 |
use strict; |
use strict; |
243 |
use Tracer; |
use Tracer; |
|
use DocUtils; |
|
244 |
use Cwd; |
use Cwd; |
245 |
use FIG; |
use FIG; |
246 |
use SFXlate; |
use SFXlate; |
250 |
use Stats; |
use Stats; |
251 |
use SFXlate; |
use SFXlate; |
252 |
|
|
253 |
|
# This is a list of the load groups in their natural order. We'll go through these in sequence, processing |
254 |
|
# the ones the user asks for. |
255 |
|
my @LoadGroups = qw(Genome Subsystem Property Annotation Source Reaction Synonym Family Drug Feature); |
256 |
|
|
257 |
# Get the command-line parameters and options. |
# Get the command-line parameters and options. |
258 |
my ($options, @parameters) = Tracer::ParseCommand({ geneFile => "", subsysFile => "", |
my ($options, @parameters) = StandardSetup(['SproutLoad', 'ERDBLoad', 'Stats', |
259 |
trace => 3, limitedFeatures => 0, |
'ERDB', 'Load', 'Sprout', 'Subsystem'], |
260 |
dbLoad => 0, dbCreate => 0 }, @ARGV); |
{ geneFile => ["", "name of the genome list file"], |
261 |
# Set up tracing. |
subsysFile => ["", "name of the trusted subsystem file"], |
262 |
TSetup("$options->{trace} SproutLoad ERDBLoad ERDB Stats Tracer Load", "+>$FIG_Config::temp/trace.log"); |
dbLoad => [0, "load the database from generated files"], |
263 |
|
dbCreate => [0, "drop and re-create the database"], |
264 |
|
loadOnly => [0, "load the database from previously generated files"], |
265 |
|
resume => [0, "resume a complete load starting with the first group specified in the parameter list"], |
266 |
|
phone => ["", "phone number (international format) to call when load finishes"], |
267 |
|
}, |
268 |
|
"<group1> <group2> ...", |
269 |
|
@ARGV); |
270 |
|
# If we're doing a load-only, turn on loading. |
271 |
|
if ($options->{loadOnly}) { |
272 |
|
$options->{dbLoad} = 1 |
273 |
|
} |
274 |
if ($options->{dbCreate}) { |
if ($options->{dbCreate}) { |
275 |
# Here we want to drop and re-create the database. |
# Here we want to drop and re-create the database. |
276 |
my $db = $FIG_Config::sproutDB; |
my $db = $FIG_Config::sproutDB; |
277 |
if ($FIG_Config::dbms eq "Pg") { |
DBKernel::CreateDB($db); |
|
my $dbport = $FIG_Config::dbport; |
|
|
my $dbuser = $FIG_Config::dbuser; |
|
|
Trace("Dropping old database (failure is okay).") if T(2); |
|
|
system("dropdb -p $dbport -U $dbuser $db"); |
|
|
Trace("Dropping old database (failure is okay).") if T(2); |
|
|
&FIG::run("createdb -p $dbport -U $dbuser $db"); |
|
|
} elsif ($FIG_Config::dbms eq "mysql") { |
|
|
Trace("Dropping old database (failure is okay).") if T(2); |
|
|
system("mysqladmin -u $FIG_Config::dbuser -p drop $db"); |
|
|
&FIG::run("mysqladmin -u $FIG_Config::dbuser -p create $db"); |
|
|
Trace("Dropping old database (failure is okay).") if T(2); |
|
278 |
} |
} |
279 |
|
# Compute the gene file name. |
280 |
|
my $geneFile = $options->{geneFile}; |
281 |
|
if ($geneFile eq 'default') { |
282 |
|
$geneFile = "$FIG_Config::sproutData/genes.tbl"; |
283 |
} |
} |
284 |
# Create the sprout loader object. Note that the Sprout object does not |
# Create the sprout loader object. Note that the Sprout object does not |
285 |
# open the database unless the "dbLoad" option is turned on. |
# open the database unless the "dbLoad" option is turned on. |
286 |
my $fig = FIG->new(); |
my $fig = FIG->new(); |
287 |
my $sprout = SFXlate->new_sprout_only(undef, undef, undef, ! $options->{dbLoad}); |
my $sprout = SFXlate->new_sprout_only(undef, undef, undef, ! $options->{dbLoad}); |
288 |
my $spl = SproutLoad->new($sprout, $fig, $options->{geneFile}, $options->{subsysFile}, $options); |
my $spl = SproutLoad->new($sprout, $fig, $geneFile, $options->{subsysFile}, $options); |
289 |
# Insure we have an output directory. |
# Insure we have an output directory. |
290 |
FIG::verify_dir($FIG_Config::sproutData); |
FIG::verify_dir($FIG_Config::sproutData); |
291 |
|
# Check for the "*" option. |
292 |
|
if ($parameters[0] eq '*') { |
293 |
|
@parameters = @LoadGroups; |
294 |
|
} |
295 |
|
# If we're resuming, we only want to have 1 parameter. |
296 |
|
my $resume = $options->{resume}; |
297 |
|
if ($resume && @parameters > 1) { |
298 |
|
Confess("If resume=1, only one load group can be specified."); |
299 |
|
} elsif (! @parameters) { |
300 |
|
Trace("No load groups were specified.") if T(0); |
301 |
|
} |
302 |
|
# Process the resume option here. We modify the incoming parameters to |
303 |
|
# contain the resume group and everything after it. |
304 |
|
if ($resume) { |
305 |
|
# Save the starting group. |
306 |
|
my $resumeGroup = $parameters[0]; |
307 |
|
# Copy the load group list into the parameter array. |
308 |
|
@parameters = @LoadGroups; |
309 |
|
# Shift out the groups until we reach our desired starting point. |
310 |
|
while (scalar(@parameters) && $parameters[0] ne $resumeGroup) { |
311 |
|
shift @parameters; |
312 |
|
} |
313 |
|
if (! @parameters) { |
314 |
|
Confess("Resume group \"$resumeGroup\" not found."); |
315 |
|
} |
316 |
|
} |
317 |
|
# Set a variable to contain return type information. |
318 |
|
my $rtype; |
319 |
|
# Set up a statistics object for statistics about the entire load. |
320 |
|
my $totalStats = Stats->new(); |
321 |
|
# Insure we catch errors. |
322 |
|
eval { |
323 |
# Process the parameters. |
# Process the parameters. |
324 |
for my $group (@parameters) { |
for my $group (@parameters) { |
325 |
Trace("Processing load group $group.") if T(2); |
Trace("Processing load group $group.") if T(2); |
326 |
my $stats; |
# Compute the string we want to execute. |
327 |
if ($group eq 'Genome' || $group eq '*') { |
my $code = "\$spl->Load${group}Data()"; |
328 |
$spl->LoadGenomeData(); |
# Load this group. |
329 |
} |
my $stats = eval($code); |
330 |
if ($group eq 'Feature' || $group eq '*') { |
if ($@) { |
331 |
$spl->LoadFeatureData(); |
Confess("Load group error: $@"); |
332 |
} |
} |
333 |
if ($group eq 'Coupling' || $group eq '*') { |
# Merge the statistics into the master. |
334 |
$spl->LoadCouplingData(); |
$totalStats->Accumulate($stats); |
335 |
} |
} |
336 |
if ($group eq 'Subsystem' || $group eq '*') { |
# Compute the statistical display. |
337 |
$spl->LoadSubsystemData(); |
my $statDisplay = $totalStats->Show(); |
338 |
} |
# Display it. |
339 |
if ($group eq 'Property' || $group eq '*') { |
Trace("Statistics for this load:\n$statDisplay") if T(2); |
340 |
$spl->LoadPropertyData(); |
# Check for a "table load failed" message. If we find one, we want |
341 |
} |
# to end with an error. |
342 |
if ($group eq 'Annotation' || $group eq '*') { |
if ($statDisplay =~ /table load failed/i) { |
343 |
$spl->LoadAnnotationData(); |
Confess("One or more table loads failed."); |
344 |
} |
} |
345 |
if ($group eq 'BBH' || $group eq '*') { |
}; |
346 |
$spl->LoadBBHData(); |
if ($@) { |
347 |
} |
Trace("Load failed with error: $@") if T(0); |
348 |
if ($group eq 'Group' || $group eq '*') { |
$rtype = "error"; |
349 |
$spl->LoadGroupData(); |
} else { |
350 |
} |
Trace("Load complete.") if T(2); |
351 |
if ($group eq 'Source' || $group eq '*') { |
$rtype = "no error"; |
|
$spl->LoadSourceData(); |
|
|
} |
|
|
if ($group eq 'External' || $group eq '*') { |
|
|
$spl->LoadExternalData(); |
|
352 |
} |
} |
353 |
if ($group eq 'Reaction' || $group eq '*') { |
if ($options->{phone}) { |
354 |
$spl->LoadReactionData(); |
my $msgID = Tracer::SendSMS($options->{phone}, "Sprout load terminated with $rtype."); |
355 |
|
if ($msgID) { |
356 |
|
Trace("Phone message sent with ID $msgID.") if T(2); |
357 |
|
} else { |
358 |
|
Trace("Phone message not sent.") if T(2); |
359 |
} |
} |
|
|
|
360 |
} |
} |
|
Trace("Load complete.") if T(2); |
|
361 |
|
|
362 |
1; |
1; |