[Bio] / FigTutorial / SEED_administration_issues.html Repository:
ViewVC logotype

Diff of /FigTutorial/SEED_administration_issues.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.4, Wed Jul 21 16:59:51 2004 UTC revision 1.9, Fri Jul 30 19:21:08 2004 UTC
# Line 1  Line 1 
1  <h1>SEED Administration</h1>  <h1>SEED Administration</h1>
2  <p>This tutorial discusses a number of issues that you will need to know about  
3    in order to install, share, and maintain your SEED installation.</p>  <p>
4  <h2>Backing Up Your Data</h2>  This tutorial discusses a number of issues that you will need to know about
5    in order to install, share, and maintain your SEED installation.
6    It is organized as follows:
7    </p>
8    
9    <ul>
10    <li><A HREF="#backups">
11         Backing Up Your Data
12    </A>
13    
14    <li><A HREF="#copying">
15         Copying a Version of the SEED
16    </A>
17    
18    <li><A HREF="#multiple_copies">
19         Running Multiple Copies of the SEED
20    </A>
21    
22    <li><A HREF="#adding_genomes">
23    Adding a New Genome to an Existing SEED
24    </A>
25    
26    <li><A HREF="#sims">
27        Computing Similarities
28    </A>
29    
30    <li><A HREF="#deleting_genomes">
31        Deleting Genomes from a Version of the SEED
32    </A>
33    
34    <li><A HREF="#reintegrate_sims">
35        Periodic Reintegration of Similarities
36    </A>
37    
38    <li><A HREF="#pins_and_clusters">
39        Computing "Pins" and "Clusters"
40    </A>
41    
42    </ul>
43    
44    
45    <h2 id="backups">Backing Up Your Data</h2>
46  The data and code stored within the SEED are organized as follows:  The data and code stored within the SEED are organized as follows:
47  <pre>  <pre>
48          ~fig                                 on a Mac: /Users/fig; on Linux: /home/fig          ~fig                                 on a Mac: /Users/fig; on Linux: /home/fig
# Line 56  Line 97 
97  would be a reasonable way to make a backup.  The copy preserves  would be a reasonable way to make a backup.  The copy preserves
98  permissions, copies recursively, and does not follow symbolic links.  permissions, copies recursively, and does not follow symbolic links.
99  <br>  <br>
100  <h2>Copying a Version of the SEED</h2>  <h2 id="copying">Copying a Version of the SEED</h2>
101    
102  To make a second copy of the SEED (either for a friend or for yourself), you should use tar  To make a second copy of the SEED (either for a friend or for yourself), you should use tar
103  to preserve a few symbolic links (which are relative, not absolute; this means that they can  to preserve a few symbolic links (which are relative, not absolute; this means that they can
# Line 156  Line 197 
197  <blockquote>  <blockquote>
198    <p><a href="http://www-unix.mcs.anl.gov/SEEDWiki/moin.cgi/SeedInstallationInstructions">      http://www-unix.mcs.anl.gov/SEEDWiki/moin.cgi/SeedInstallationInstructions</a></p>    <p><a href="http://www-unix.mcs.anl.gov/SEEDWiki/moin.cgi/SeedInstallationInstructions">      http://www-unix.mcs.anl.gov/SEEDWiki/moin.cgi/SeedInstallationInstructions</a></p>
199  </blockquote>  </blockquote>
200  <h2>Running Multiple Copies of the SEED</h2>  <h2 id="multiple_copies">Running Multiple Copies of the SEED</h2>
201    
202  For individual users that use the SEED to support comparative analysis, a single copy is completely  For individual users that use the SEED to support comparative analysis, a single copy is completely
203  adequate.  Adding genomes can usually be done without disrupting normal use, and a very occasional major  adequate.  Adding genomes can usually be done without disrupting normal use, and a very occasional major
# Line 166  Line 207 
207  effort.  In this case, you have a user community that is sensitive to disruptions of service, and you  effort.  In this case, you have a user community that is sensitive to disruptions of service, and you
208  have frequent demands to update versions of data.  In this case, it is best to have two systems: the  have frequent demands to update versions of data.  In this case, it is best to have two systems: the
209  <b>production system</b> is used to support the larger user community, and the <b>update system</b> is  <b>production system</b> is used to support the larger user community, and the <b>update system</b> is
210  used to prepare updated versions of the system.  Even so, work stoppages of 4-8 hours will occur when  used to prepare updated versions of the system.
211  new releases are swapped in.  To swap in new data from the update system to the production system,  New genomes are added to the update system, and then periodically a
212  you need to  revised Data directory is extracted to update the production system.
213    Even so, work stoppages of a few hours will occur when
214    new releases are swapped in.
215    <p>
216    This use of an "update" and a "production" system is quite analogous
217    to running a production system which is occasionally updated from new
218    Data DVDs (which FIG normally makes available about every 4-6 months).
219    That is, in both cases you are updating a production system from a
220    newly created <b>Data</b> directory that is lacking assignments and
221    annotations that exist on your production system.  However, if you have
222    added new genomes to the production system (that are not part of the
223    releases you may acquire via DVDs), you should get the new release,
224    install the versions of your local genomes, and then do this update
225    procedure.
226    <p>
227    The plan we propose is to build a completely encapsulated new version
228    of the system, then capture updates from the old production system, update
229    the new production system, and then make the new version the actual
230    production system.  This last step amounts to altering a symbolic link
231    to point at the new production system rather than the old.  This has
232    the virtue of ease of recovery -- that is, if something goes wrong you
233    can flip back to the old system.
234    The actual steps are as follows:
235  <ol>  <ol>
236  <li>stop all work on the production machine,  
237    <li> First, make sure that you are in the BASH shell by typing "echo $SHELL";
238       if the result is not "bash", type "bash" to enter the BASH shell.
239    
240    <li> Next, check that the result of typing "which perl" is the version
241       of perl owned by the SEED; it should look something like
242       <pre>
243           /Users/fig/FIGdisk/env/mac/bin/perl
244       </pre>
245       although the exact results will depend on where your existing copy
246       of the SEED is installed, whether your platform is a Macintosh or LINUX,
247       etc. If the result does not look similar to the above, type:
248       <pre>
249           source Path_to_FIGdisk/config/fig-user-env.sh
250       </pre>
251       to setup your FIG environment properly.
252    
253    <li> Next, make a copy of the Code Distribution Environment (from a DVD
254    or via the network).  Suppose that we have made such a directory in
255    CodeDistEnv.  Then use,
256    <pre>
257            cd CodeDistEnv
258            ./install-code TargetDirectory
259    </pre>
260    where <b>TargetDirectory</b> is where you wish to build the new
261    production version.  We recommend calling it something like
262    <b>FIGdisk.July24</b>.
263    
264    <li> Stop all work on the production machine for the duration of the update.
265         You do this by clicking on the "Seed Control Panel" link,
266         and then entering an explanatory message in the text box
267         and clicking on the "Disable SEED server" button.
268    
269  <li>You now need to capture the assignments, annotations and  <li>You now need to capture the assignments, annotations and
270  subsystems work that has been done on the production machine.  To do       subsystems work that has been done on the production machine.
271  this, you need to know when the last production release was       To do this, you need to know when the last production release
272  installed.  Suppose that it was July 1, 2004.  If that was the date,       was installed.  Suppose that it was July 1, 2004.
273  we recommend that you       If that was the date, we recommend that you run
 run<br><br>  
274  <pre>  <pre>
275      <b>extract_data_for_syncing_after_update 7/1/2004 /tmp/sync.data.july.1.2004<</b>          <b>extract_data_for_syncing_after_update 7/1/2004 /tmp/sync.data.july.1.2004</b>
276  </pre>  </pre>
277  <br><br>  
278  This will capture your updates and save them in the directory  This will capture your updates and save them in the directory
279  /tmp/sync.data.july.1.2004.       /tmp/sync.data.july.1.2004.<br>
280  <li>Now, you need to replace your <b>Data</b> directory (within  
281  <b>FIGdisk/FIG</b>) with the new version from the update system.  We  <li>Now, you need to stop the existing production system using
282  suggest that you do the following:  <pre>
283  <ol>          ~/FIGdisk/bin/stop-servers
284  <li>archive the existing <b>Data</b> directory.  These can usually be  </pre>
285  discarded within a month or two, but keeping them around is a good  
286  safety measure.  <li>Now, you need to configure the runtime environment for the system
287  <li>move a copy of the update <b>Data</b> directory into the  you are running on.
288  <b>FIGdisk/FIG</b> directory.  To do this, use
289  </ol>  <pre>
290  At this point, you have a version of the data from the update system          cd TargetDirectory
291  in the right location, but the internal databases all contain the old data.          ./configure MacOrLinux
292  <li> Now, run  </pre>
293    where <b>MacOrLinux</b> must be a currently supported environment.
294    Those that are supported on July 24, 2004 are <b>mac</b> for
295    Macintoshes running panther, <b>mac-jaguar</b> for those that have not
296    upgraded to panther, and <b>linux-postgres</b>.
297    
298    <li>Now, you need to insert the new Data directory into the newly
299    constructed version of the SEED.  To do this use
300    <pre>
301            chmod -R 777 TheNewData
302            cd TargetDirectory/FIG
303            ln -s TheNewData Data
304    </pre>
305    where TheNewData is the new Data directory, which normally comes  from the
306    update system.  If you acquired a new Data directory via Data DVDs, you
307    will need to unpack them using the README instructions, but what
308    results is a new version of the <b>Data</b> directory.
309    
310    <li>Now, you need to start the servers in order to load the databases
311    with the new release using
312  <pre>  <pre>
313          <b>fig load_all</b>          cd TargetDirectory/bin
314            ./start-servers
315            cd ..
316            source config/fig-user-env.sh
317            init_FIG
318            fig load_all
319  </pre>  </pre>
320  to reload the production databases with the data from the newly inserted Data directory.  This last command will run for several hours.
321  This will usually take several hours.  
322  <li>Now, you need to capture the changes made to the old production  <li>Now, you need to capture the changes made to the old production
323  version using something like  version using something like
 <br>  
324  <pre>  <pre>
325          <b>sync_new_system /tmp/sync.data.july.1.2004 make-assignments</b>          <b>sync_new_system /tmp/sync.data.july.1.2004 make-assignments</b>
326  </pre>  </pre>
327  <br>  <li>Run
328  <li> make the production machine available for use.  <pre>
329            index_annotations
330            index_subsystems
331            make_indexes
332    </pre>
333    
334    <li> Now, finally, you should alter the symbolic link in <i>~fig</i> to
335    the current FIGdisk using something like:
336    <pre>
337            cd ~fig
338            rm FIGdisk     # should be removing a symbolic link to the current SEED
339            ln -s TargetDirectory FIGdisk
340    </pre>
341    That should make the new SEED the one available through the Web interface.
342    
343  <li>You should now bring your update system to the same state as the  <li>You should now bring your update system to the same state as the
344  production system.  This can be done by making sure that  production system.  This can be done by making sure that
345  <b>/tmp/sync.data.july.1.2004</b> is accessible to the update system.  <b>/tmp/sync.data.july.1.2004</b> is accessible to the update system.
# Line 222  Line 353 
353  <br>  <br>
354  on the update machine.  on the update machine.
355  </ol>  </ol>
356    <p>
357    
358  Our experience is that anytime a group wishes to share a common production environment,  Our experience is that anytime a group wishes to share a common production environment,
359  this 2-system approach is the way to do it.  You can, if necessary,  this 2-system approach is the way to do it.  You can, if necessary,
360  put both systems on the same physical machine.  This does require some  put both systems on the same physical machine.  This does require some
# Line 233  Line 366 
366  desirable to spend a little more and get at least 1 gigabyte of main  desirable to spend a little more and get at least 1 gigabyte of main
367  memory and 200 gigabytes of external disk.  memory and 200 gigabytes of external disk.
368  <br>  <br>
369  <h2>Adding a New Genome to an Existing SEED</h2>  <h2 id="adding_genomes">Adding a New Genome to an Existing SEED</h2>
370  To add a new genome to a running SEED is fairly easy, but there are a  To add a new genome to a running SEED is fairly easy, but there are a
371  number of details that do have to be handled with care.  number of details that do have to be handled with care.
372  <p>  <p>
# Line 485  Line 618 
618  The <i>add_genome</i> request will add your new genome and queue a computational request that similarities  The <i>add_genome</i> request will add your new genome and queue a computational request that similarities
619  be computed for the protein-encoding genes.  be computed for the protein-encoding genes.
620    
621  <h2>Computing Similarities</h2>  <h2 id="sims">Computing Similarities</h2>
622    
623  Adding a genome does not automatically get similarities computed for the new genome; it queues the request.  Adding a genome does not automatically get similarities computed for the new genome; it queues the request.
624  To get the similarities actually computed, you need to establish a computational environment on which  To get the similarities actually computed, you need to establish a computational environment on which
# Line 536  Line 669 
669  interfaces for handling high-volume requests.  At FIG, we will maintain a set of instructions on how to set up  interfaces for handling high-volume requests.  At FIG, we will maintain a set of instructions on how to set up
670  your configuration to exploit these resources.  your configuration to exploit these resources.
671    
672  <h2>Deleting Genomes from a Version of the SEED </h2>  <h2 id="deleting_genomes">Deleting Genomes from a Version of the SEED</h2>
673    
674  There are two common instances in which one wishes to delete genomes from a running version of the SEED: one is  There are two common instances in which one wishes to delete genomes from a running version of the SEED: one is
675  when you wish to replace an existing version of a genome (in which case the replacement is viewed as first  when you wish to replace an existing version of a genome (in which case the replacement is viewed as first
# Line 574  Line 707 
707          rm -rf /Volumes/Tmp/ExtractedData          rm -rf /Volumes/Tmp/ExtractedData
708  </pre>  </pre>
709    
710  <h2>Periodic Reintegration of Similarities</h2>   <h2 id="reintegrate_sims">Periodic Reintegration of Similarities</h2>
711    
712  When the initial SEED was constructed, similarities were computed.  For most similarities of the form  When the initial SEED was constructed, similarities were computed.  For most similarities of the form
713  "Id1 and Id2 are similar", entries were "recorded" for both Id1 and Id2.  This is not always true,  "Id1 and Id2 are similar", entries were "recorded" for both Id1 and Id2.  This is not always true,
# Line 592  Line 725 
725  </pre>  </pre>
726  The job will probably run for quite a while (perhaps as much as a day or two).  The job will probably run for quite a while (perhaps as much as a day or two).
727    
728  <h2>Computing "Pins" and "Clusters"</h2>  <h2 id="pins_and_clusters">Computing "Pins" and "Clusters"</h2>
729    
730  The SEED displays potentially significant clusters on prokaryotic chromosomes.  In the  The SEED displays potentially significant clusters on prokaryotic chromosomes.  In the
731  process of finding preserved contiguity, it computes "pins", which are simply a set of genes  process of finding preserved contiguity, it computes "pins", which are simply a set of genes

Legend:
Removed from v.1.4  
changed lines
  Added in v.1.9

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3