2 |
|
|
3 |
=head1 Load Sprout Tables |
=head1 Load Sprout Tables |
4 |
|
|
5 |
Create the load files for a group of Sprout tables. The parameters are the names of |
=head2 Introduction |
6 |
the table groups whose data is to be created. The legal table group names are given below. |
|
7 |
|
The Sprout database reflects a snapshot of the SEED taken at a particular point in |
8 |
|
time. At some point in the future, it will be possible to add annotations to the |
9 |
|
Sprout data. All records added to Sprout after the snapshot is taken are |
10 |
|
specially-marked so that the changes can be copied to the SEED. The SEED remains |
11 |
|
the live version of the data. |
12 |
|
|
13 |
|
The snapshot is produced by reading the SEED data and writing it to sequential |
14 |
|
files. There is one file per Sprout table, and each such file's name consists of |
15 |
|
the table name with the suffix C<dtx>. Thus, the file for the C<Genome> table |
16 |
|
would be named C<Genome.dtx>. These files are used to load the actual Sprout |
17 |
|
database and to generate Glimpse indices. |
18 |
|
|
19 |
|
To load all the Sprout tables and then validate the result, you need to issue three |
20 |
|
commands. |
21 |
|
|
22 |
|
LoadSproutTables -dbLoad -dbCreate "*" |
23 |
|
TestSproutLoad |
24 |
|
index_sprout |
25 |
|
|
26 |
|
All three commands send output to the console. In addition, C<LoadSproutTables> and |
27 |
|
C<TestSproutLoad> write tracing information to C<trace.log> in the FIG temporary |
28 |
|
directory (B<$FIG_Config::Tmp>). At the bottom of the log file will be a complete |
29 |
|
list of errors. If errors occur in C<LoadSproutTables>, then the data must be corrected |
30 |
|
and the offending table group reloaded. So, for example, if there are errors in the |
31 |
|
load of the B<MadeAnnotation> and B<Compound> tables, you would need to run |
32 |
|
|
33 |
|
LoadSproutTables -dbLoad Annotation Reaction |
34 |
|
|
35 |
|
because B<MadeAnnotation> is in the C<Annotation> group, and B<Compound> is in the |
36 |
|
C<Reaction> group. A list of the groups is given below. |
37 |
|
|
38 |
|
You can omit the C<dbLoad> option to create the load files without |
39 |
|
loading the database, and you can add a C<trace> option to change the trace level. |
40 |
|
The command below creates the Genome-related load files with a trace level of 3 and |
41 |
|
does not load them into the Sprout database. |
42 |
|
|
43 |
|
LoadSproutTables -trace=3 Genome |
44 |
|
|
45 |
|
C<LoadSproutTables> takes a long time to run, so setting the trace level to 3 helps |
46 |
|
to give you an idea of the progress. |
47 |
|
|
48 |
|
Once the Sprout database is loaded, B<TestSproutLoad> can be used to verify the load |
49 |
|
against the FIG data. Again, the end of the C<trace.log> file will contain a summary |
50 |
|
of the errors found. Like C<LoadSproutTables>, C<TestSproutLoad> is a time-consuming |
51 |
|
script, so you may want to set the trace level to 3 to see visible progress. |
52 |
|
|
53 |
|
TestSproutLoad -trace=3 |
54 |
|
|
55 |
|
Unlike C<LoadSproutTables>, in C<TestSproutLoad>, the individual errors found are |
56 |
|
mixed in with the trace messages. They are all, however, marked with a trace type |
57 |
|
of B<Problem>, as shown in the fragment below. |
58 |
|
|
59 |
|
11/02/2005 19:15:16 <main>: Processing feature fig|100226.1.peg.7742. |
60 |
|
11/02/2005 19:15:17 <main>: Processing feature fig|100226.1.peg.7741. |
61 |
|
11/02/2005 19:15:17 <Problem>: assignment "Short-chain dehydrodenase ... |
62 |
|
11/02/2005 19:15:17 <Problem>: assignment "putative oxidoreductase." ... |
63 |
|
11/02/2005 19:15:17 <Problem>: Incorrect assignment for fig|100226.1.peg.7741... |
64 |
|
11/02/2005 19:15:17 <Problem>: Incorrect number of annotations found in ... |
65 |
|
11/02/2005 19:15:17 <main>: Processing feature fig|100226.1.peg.7740. |
66 |
|
11/02/2005 19:15:18 <main>: Processing feature fig|100226.1.peg.7739. |
67 |
|
|
68 |
|
The test may reveal that some tables need to be reloaded, or that a software |
69 |
|
problem has crept into the Sprout. |
70 |
|
|
71 |
|
Once all the tables have the correct data, C<index_sprout> can be run to create the |
72 |
|
Glimpse indexes. |
73 |
|
|
74 |
|
=head2 Procedure For Loading Sprout |
75 |
|
|
76 |
|
=over 4 |
77 |
|
|
78 |
|
=item 1 |
79 |
|
|
80 |
|
Type C<LoadSproutTables -dbLoad -dbCreate "*"> and press ENTER. This will create |
81 |
|
the C<dtx> files and load them. |
82 |
|
|
83 |
|
=item 2 |
84 |
|
|
85 |
|
Type C<TestSproutLoad> and press ENTER. This will validate the Sprout database |
86 |
|
against the SEED data. |
87 |
|
|
88 |
|
=item 3 |
89 |
|
|
90 |
|
If any errors are detected in step (2), it is most likely due to a change in |
91 |
|
SEED that did not make it to Sprout. Contact Bruce Parrello or Robert Olson |
92 |
|
to get the code updated properly. |
93 |
|
|
94 |
|
=item 4 |
95 |
|
|
96 |
|
Type C<index_sprout> and press ENTER. This will create the Glimpse indexes |
97 |
|
for the Sprout data. |
98 |
|
|
99 |
|
=back |
100 |
|
|
101 |
|
=head2 LoadSproutTables Command |
102 |
|
|
103 |
|
C<LoadSproutTables> creates the load files for Sprout tables and optionally loads them. |
104 |
|
The parameters are the names of the table groups whose data is to be created. |
105 |
|
The legal table group names are given below. |
106 |
|
|
107 |
=over 4 |
=over 4 |
108 |
|
|
124 |
|
|
125 |
Loads B<Subsystem>, B<Role>, B<SSCell>, B<ContainsFeature>, B<IsGenomeOf>, |
Loads B<Subsystem>, B<Role>, B<SSCell>, B<ContainsFeature>, B<IsGenomeOf>, |
126 |
B<IsRoleOf>, B<OccursInSubsystem>, B<ParticipatesIn>, B<HasSSCell>, |
B<IsRoleOf>, B<OccursInSubsystem>, B<ParticipatesIn>, B<HasSSCell>, |
127 |
B<Catalyzes>, B<Reaction>, B<ConsistsOfRoles>, B<RoleSubset>, B<HasRoleSubset>, |
B<Catalyzes>, B<ConsistsOfRoles>, B<RoleSubset>, B<HasRoleSubset>, |
128 |
B<ConsistsOfGenomes>, B<GenomeSubset>, B<HasGenomeSubset> |
B<ConsistsOfGenomes>, B<GenomeSubset>, B<HasGenomeSubset>, B<Diagram>, |
129 |
|
B<RoleOccursIn>. |
130 |
|
|
131 |
=item Annotation |
=item Annotation |
132 |
|
|
133 |
Loads B<SproutUser>, B<UserAccess>, B<Annotation>, B<IsTargetOfAnnotation>, |
Loads B<SproutUser>, B<UserAccess>, B<Annotation>, B<IsTargetOfAnnotation>, |
134 |
B<MadeAnnotation>. |
B<MadeAnnotation>. |
135 |
|
|
|
=item Diagram |
|
|
|
|
|
Loads B<Diagram>, B<RoleOccursIn>. |
|
|
|
|
136 |
=item Property |
=item Property |
137 |
|
|
138 |
Loads B<Property>, B<HasProperty>. |
Loads B<Property>, B<HasProperty>. |
156 |
=item Reaction |
=item Reaction |
157 |
|
|
158 |
Loads B<ReactionURL>, B<Compound>, B<CompoundName>, |
Loads B<ReactionURL>, B<Compound>, B<CompoundName>, |
159 |
B<CompoundCAS>, B<IsAComponentOf>. |
B<CompoundCAS>, B<IsAComponentOf>, B<Reaction>. |
160 |
|
|
161 |
=item * |
=item * |
162 |
|
|
188 |
|
|
189 |
Only generate the B<Feature> and B<IsLocatedIn> tables when processing the feature group. |
Only generate the B<Feature> and B<IsLocatedIn> tables when processing the feature group. |
190 |
|
|
191 |
|
=item dbLoad |
192 |
|
|
193 |
|
If TRUE, the database tables will be loaded automatically from the load files created. |
194 |
|
|
195 |
|
=item dbCreate |
196 |
|
|
197 |
|
If TRUE, the database will be created. If the database exists already, it will be |
198 |
|
dropped. Use the function with caution. |
199 |
|
|
200 |
=back |
=back |
201 |
|
|
202 |
=cut |
=cut |
211 |
use File::Path; |
use File::Path; |
212 |
use SproutLoad; |
use SproutLoad; |
213 |
use Stats; |
use Stats; |
214 |
|
use SFXlate; |
215 |
|
|
216 |
# Get the command-line parameters and options. |
# Get the command-line parameters and options. |
217 |
my ($options, @parameters) = Tracer::ParseCommand({ geneFile => "", subsysFile => "", |
my ($options, @parameters) = Tracer::ParseCommand({ geneFile => "", subsysFile => "", |
218 |
trace => 3, limitedFeatures => 0 }, |
trace => 3, limitedFeatures => 0, |
219 |
@ARGV); |
dbLoad => 0, dbCreate => 0 }, @ARGV); |
220 |
# Set up tracing. |
# Set up tracing. |
221 |
TSetup("$options->{trace} SproutLoad ERDBLoad ERDB Stats Tracer Load", "+>$FIG_Config::temp/trace.log"); |
TSetup("$options->{trace} SproutLoad ERDBLoad ERDB Stats Tracer Load", "+>$FIG_Config::temp/trace.log"); |
222 |
# Create the sprout loader object. |
if ($options->{dbCreate}) { |
223 |
|
# Here we want to drop and re-create the database. |
224 |
|
my $db = $FIG_Config::sproutDB; |
225 |
|
if ($FIG_Config::dbms eq "Pg") { |
226 |
|
my $dbport = $FIG_Config::dbport; |
227 |
|
my $dbuser = $FIG_Config::dbuser; |
228 |
|
Trace("Dropping old database (failure is okay).") if T(2); |
229 |
|
system("dropdb -p $dbport -U $dbuser $db"); |
230 |
|
Trace("Dropping old database (failure is okay).") if T(2); |
231 |
|
&FIG::run("createdb -p $dbport -U $dbuser $db"); |
232 |
|
} elsif ($FIG_Config::dbms eq "mysql") { |
233 |
|
Trace("Dropping old database (failure is okay).") if T(2); |
234 |
|
system("mysqladmin -u $FIG_Config::dbuser -p drop $db"); |
235 |
|
&FIG::run("mysqladmin -u $FIG_Config::dbuser -p create $db"); |
236 |
|
Trace("Dropping old database (failure is okay).") if T(2); |
237 |
|
} |
238 |
|
|
239 |
|
} |
240 |
|
# Create the sprout loader object. Note that the Sprout object does not |
241 |
|
# open the database unless the "dbLoad" option is turned on. |
242 |
my $fig = FIG->new(); |
my $fig = FIG->new(); |
243 |
my $sprout = Sprout->new($FIG_Config::sproutDB, { noDBOpen => 1 }); |
my $sprout = SFXlate->new_sprout_only(undef, undef, undef, ! $options->{dbLoad}); |
244 |
my $spl = SproutLoad->new($sprout, $fig, $options->{geneFile}, $options->{subsysFile}, $options); |
my $spl = SproutLoad->new($sprout, $fig, $options->{geneFile}, $options->{subsysFile}, $options); |
245 |
|
# Insure we have an output directory. |
246 |
|
FIG::verify_dir($FIG_Config::sproutData); |
247 |
# Process the parameters. |
# Process the parameters. |
248 |
for my $group (@parameters) { |
for my $group (@parameters) { |
249 |
Trace("Processing load group $group.") if T(2); |
Trace("Processing load group $group.") if T(2); |
263 |
if ($group eq 'Property' || $group eq '*') { |
if ($group eq 'Property' || $group eq '*') { |
264 |
$spl->LoadPropertyData(); |
$spl->LoadPropertyData(); |
265 |
} |
} |
|
if ($group eq 'Diagram' || $group eq '*') { |
|
|
$spl->LoadDiagramData(); |
|
|
} |
|
266 |
if ($group eq 'Annotation' || $group eq '*') { |
if ($group eq 'Annotation' || $group eq '*') { |
267 |
$spl->LoadAnnotationData(); |
$spl->LoadAnnotationData(); |
268 |
} |
} |