[Bio] / Sprout / LoadSproutTables.pl Repository:
ViewVC logotype

Annotation of /Sprout/LoadSproutTables.pl

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.12 - (view) (download) (as text)

1 : parrello 1.1 #!/usr/bin/perl -w
2 :    
3 :     =head1 Load Sprout Tables
4 :    
5 : parrello 1.12 =head2 Introduction
6 :    
7 :     This script creates the load files for Sprout tables and optionally loads them.
8 :     The parameters are the names of the table groups whose data is to be created.
9 :     The legal table group names are given below.
10 : parrello 1.1
11 :     =over 4
12 :    
13 :     =item Genome
14 :    
15 :     Loads B<Genome>, B<HasContig>, B<Contig>, B<IsMadeUpOf>, and B<Sequence>.
16 :    
17 :     =item Coupling
18 :    
19 :     Loads B<Coupling>, B<IsEvidencedBy>, B<PCH>, B<ParticipatesInCoupling>,
20 :     B<UsesAsEvidence>.
21 :    
22 :     =item Feature
23 :    
24 :     Loads B<Feature>, B<FeatureAlias>, B<FeatureTranslation>, B<FeatureUpstream>,
25 : parrello 1.2 B<IsLocatedIn>, B<FeatureLink>.
26 : parrello 1.1
27 :     =item Subsystem
28 :    
29 : parrello 1.2 Loads B<Subsystem>, B<Role>, B<SSCell>, B<ContainsFeature>, B<IsGenomeOf>,
30 : parrello 1.8 B<IsRoleOf>, B<OccursInSubsystem>, B<ParticipatesIn>, B<HasSSCell>,
31 : parrello 1.11 B<Catalyzes>, B<ConsistsOfRoles>, B<RoleSubset>, B<HasRoleSubset>,
32 : parrello 1.8 B<ConsistsOfGenomes>, B<GenomeSubset>, B<HasGenomeSubset>
33 : parrello 1.1
34 : parrello 1.2 =item Annotation
35 :    
36 :     Loads B<SproutUser>, B<UserAccess>, B<Annotation>, B<IsTargetOfAnnotation>,
37 :     B<MadeAnnotation>.
38 :    
39 :     =item Diagram
40 :    
41 :     Loads B<Diagram>, B<RoleOccursIn>.
42 :    
43 :     =item Property
44 :    
45 :     Loads B<Property>, B<HasProperty>.
46 :    
47 :     =item BBH
48 :    
49 :     Loads B<IsBidirectionalBestHitOf>.
50 :    
51 : parrello 1.3 =item Group
52 :    
53 :     Loads B<GenomeGroups>.
54 :    
55 :     =item Source
56 :    
57 :     Loads B<Source>, B<ComesFrom>, B<SourceURL>.
58 :    
59 : parrello 1.4 =item External
60 :    
61 :     Loads B<ExternalAliasOrg>, B<ExternalAliasFunc>.
62 :    
63 : parrello 1.8 =item Reaction
64 :    
65 :     Loads B<ReactionURL>, B<Compound>, B<CompoundName>,
66 : parrello 1.11 B<CompoundCAS>, B<IsAComponentOf>, B<Reaction>.
67 : parrello 1.8
68 : parrello 1.3 =item *
69 :    
70 :     Loads all of the above tables.
71 :    
72 : parrello 1.1 =back
73 :    
74 : parrello 1.7 The command-line options are given below.
75 : parrello 1.1
76 :     =over 4
77 :    
78 :     =item geneFile
79 :    
80 :     The name of the file containing the genomes and their associated access codes. The
81 :     file should have one line per genome, each line consisting of the genome ID followed
82 :     by the access code, separated by a tab. If no file is specified, all complete genomes
83 :     will be processed and the access code will be 1.
84 :    
85 :     =item subsysFile
86 :    
87 :     The name of the file containing the trusted subsystems. The file should have one line
88 :     per trusted subsystem. If no file is specified, all subsystems will be trusted.
89 :    
90 :     =item trace
91 :    
92 :     Desired tracing level. The default is 3.
93 :    
94 : parrello 1.7 =item limitedFeatures
95 :    
96 :     Only generate the B<Feature> and B<IsLocatedIn> tables when processing the feature group.
97 :    
98 : parrello 1.10 =item dbLoad
99 :    
100 :     If TRUE, the database tables will be loaded automatically from the load files created.
101 :    
102 : parrello 1.1 =back
103 :    
104 : parrello 1.12 =head2 Usage
105 :    
106 :     To load all the Sprout tables and then validate the result, you need to issue three
107 :     commands.
108 :    
109 :     LoadSproutTables -dbLoad "*"
110 :     TestSproutLoad
111 :     index_sprout
112 :    
113 :     All three commands send output to the console. In addition, C<LoadSproutTables> and
114 :     C<TestSproutLoad> write tracing information to C<trace.log> in the FIG temporary
115 :     directory (B<$FIG_Config::Tmp>). At the bottom of the log file will be a complete
116 :     list of errors. If errors occur in C<LoadSproutTables>, then the data must be corrected
117 :     and the offending table group reloaded. So, for example, if there are errors in the
118 :     load of the B<MadeAnnotation> and B<Compound> tables, you would need to run
119 :    
120 :     LoadSproutTables -dbLoad Annotation Reaction
121 :    
122 :     because B<MadeAnnotation> is in the C<Annotation> group, and B<Compound> is in the
123 :     C<Reaction> group. You can omit the C<dbLoad> option to create the load files without
124 :     loading the database, and you can add a C<trace> option to change the trace level.
125 :     The command below creates the Genome-related load files with a trace level of 3 and
126 :     does not load them into the Sprout database.
127 :    
128 :     LoadSproutTables -trace=3 Genome
129 :    
130 :     C<LoadSproutTables> takes a long time to run, so setting the trace level to 3 helps
131 :     to give you an idea of the progress.
132 :    
133 :     Once the Sprout database is loaded, B<TestSproutLoad> can be used to verify the load
134 :     against the FIG data. Again, the end of the C<trace.log> file will contain a summary
135 :     of the errors found. Like C<LoadSproutTables>, C<TestSproutLoad> is a time-consuming
136 :     script, so you may want to set the trace level to 3 to see visible progress.
137 :    
138 :     TestSproutLoad -trace=3
139 :    
140 :     Unlike C<LoadSproutTables>, in C<TestSproutLoad>, the individual errors found are
141 :     mixed in with the trace messages. They are all, however, marked with a trace type
142 :     of B<Problem>, as shown in the fragment below.
143 :    
144 :     11/02/2005 19:15:16 <main>: Processing feature fig|100226.1.peg.7742.
145 :     11/02/2005 19:15:17 <main>: Processing feature fig|100226.1.peg.7741.
146 :     11/02/2005 19:15:17 <Problem>: assignment "Short-chain dehydrodenase ...
147 :     11/02/2005 19:15:17 <Problem>: assignment "putative oxidoreductase." ...
148 :     11/02/2005 19:15:17 <Problem>: Incorrect assignment for fig|100226.1.peg.7741...
149 :     11/02/2005 19:15:17 <Problem>: Incorrect number of annotations found in ...
150 :     11/02/2005 19:15:17 <main>: Processing feature fig|100226.1.peg.7740.
151 :     11/02/2005 19:15:18 <main>: Processing feature fig|100226.1.peg.7739.
152 :    
153 :     The test may reveal that some tables need to be reloaded, or that a software
154 :     problem has crept into the Sprout.
155 :    
156 :     Once all the tables have the correct data, C<index_sprout> can be run to create the
157 :     Glimpse indexes.
158 :    
159 : parrello 1.1 =cut
160 :    
161 :     use strict;
162 :     use Tracer;
163 :     use DocUtils;
164 :     use Cwd;
165 :     use FIG;
166 :     use SFXlate;
167 :     use File::Copy;
168 :     use File::Path;
169 :     use SproutLoad;
170 :     use Stats;
171 : parrello 1.9 use SFXlate;
172 : parrello 1.1
173 :     # Get the command-line parameters and options.
174 :     my ($options, @parameters) = Tracer::ParseCommand({ geneFile => "", subsysFile => "",
175 : parrello 1.10 trace => 3, limitedFeatures => 0,
176 :     dbLoad => 0 }, @ARGV);
177 : parrello 1.1 # Set up tracing.
178 : parrello 1.4 TSetup("$options->{trace} SproutLoad ERDBLoad ERDB Stats Tracer Load", "+>$FIG_Config::temp/trace.log");
179 : parrello 1.9 # Create the sprout loader object. Note that the Sprout object does not
180 : parrello 1.10 # open the database unless the "dbLoad" option is turned on.
181 : parrello 1.1 my $fig = FIG->new();
182 : parrello 1.10 my $sprout = SFXlate->new_sprout_only(undef, undef, undef, ! $options->{dbLoad});
183 : parrello 1.7 my $spl = SproutLoad->new($sprout, $fig, $options->{geneFile}, $options->{subsysFile}, $options);
184 : parrello 1.1 # Process the parameters.
185 :     for my $group (@parameters) {
186 :     Trace("Processing load group $group.") if T(2);
187 :     my $stats;
188 : parrello 1.3 if ($group eq 'Genome' || $group eq '*') {
189 : parrello 1.1 $spl->LoadGenomeData();
190 : parrello 1.3 }
191 :     if ($group eq 'Feature' || $group eq '*') {
192 : parrello 1.1 $spl->LoadFeatureData();
193 : parrello 1.3 }
194 :     if ($group eq 'Coupling' || $group eq '*') {
195 : parrello 1.1 $spl->LoadCouplingData();
196 : parrello 1.3 }
197 :     if ($group eq 'Subsystem' || $group eq '*') {
198 : parrello 1.1 $spl->LoadSubsystemData();
199 : parrello 1.3 }
200 :     if ($group eq 'Property' || $group eq '*') {
201 : parrello 1.1 $spl->LoadPropertyData();
202 : parrello 1.3 }
203 :     if ($group eq 'Diagram' || $group eq '*') {
204 : parrello 1.2 $spl->LoadDiagramData();
205 : parrello 1.3 }
206 :     if ($group eq 'Annotation' || $group eq '*') {
207 : parrello 1.2 $spl->LoadAnnotationData();
208 : parrello 1.3 }
209 :     if ($group eq 'BBH' || $group eq '*') {
210 : parrello 1.2 $spl->LoadBBHData();
211 : parrello 1.1 }
212 : parrello 1.4 if ($group eq 'Group' || $group eq '*') {
213 : parrello 1.3 $spl->LoadGroupData();
214 :     }
215 :     if ($group eq 'Source' || $group eq '*') {
216 :     $spl->LoadSourceData();
217 :     }
218 : parrello 1.4 if ($group eq 'External' || $group eq '*') {
219 :     $spl->LoadExternalData();
220 :     }
221 : parrello 1.8 if ($group eq 'Reaction' || $group eq '*') {
222 :     $spl->LoadReactionData();
223 :     }
224 : parrello 1.3
225 : parrello 1.1 }
226 :     Trace("Load complete.") if T(2);
227 :    
228 :     1;

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3