[Bio] / FigTutorial / SEED_administration_issues.html Repository:
ViewVC logotype

Diff of /FigTutorial/SEED_administration_issues.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.1, Mon Jun 28 18:08:27 2004 UTC revision 1.9, Fri Jul 30 19:21:08 2004 UTC
# Line 1  Line 1 
1  <h1>Backing Up Your Data</h1>  <h1>SEED Administration</h1>
2    
3    <p>
4    This tutorial discusses a number of issues that you will need to know about
5    in order to install, share, and maintain your SEED installation.
6    It is organized as follows:
7    </p>
8    
9    <ul>
10    <li><A HREF="#backups">
11         Backing Up Your Data
12    </A>
13    
14    <li><A HREF="#copying">
15         Copying a Version of the SEED
16    </A>
17    
18    <li><A HREF="#multiple_copies">
19         Running Multiple Copies of the SEED
20    </A>
21    
22    <li><A HREF="#adding_genomes">
23    Adding a New Genome to an Existing SEED
24    </A>
25    
26    <li><A HREF="#sims">
27        Computing Similarities
28    </A>
29    
30    <li><A HREF="#deleting_genomes">
31        Deleting Genomes from a Version of the SEED
32    </A>
33    
34    <li><A HREF="#reintegrate_sims">
35        Periodic Reintegration of Similarities
36    </A>
37    
38    <li><A HREF="#pins_and_clusters">
39        Computing "Pins" and "Clusters"
40    </A>
41    
42    </ul>
43    
44    
45    <h2 id="backups">Backing Up Your Data</h2>
46  The data and code stored within the SEED are organized as follows:  The data and code stored within the SEED are organized as follows:
47  <pre>  <pre>
48          ~fig                                 on a Mac: /Users/fig; on Linux: /home/fig          ~fig                                 on a Mac: /Users/fig; on Linux: /home/fig
# Line 8  Line 52 
52                                  Tmp          temporary files                                  Tmp          temporary files
53                                  Data         data in readable form                                  Data         data in readable form
54  </pre>  </pre>
55  <br>  <ol><li>
 <br>  
 <ol>  
 <li>  
56  The directory <b>FIGdisk</b> holds both the code and data for the  The directory <b>FIGdisk</b> holds both the code and data for the
57  SEED.  The data is loaded into a database system that stores the data  SEED.  The data is loaded into a database system that stores the data
58  in a location external to FIGdisk, but otherwise a running SEED is  in a location external to FIGdisk, but otherwise a running SEED is
# Line 56  Line 97 
97  would be a reasonable way to make a backup.  The copy preserves  would be a reasonable way to make a backup.  The copy preserves
98  permissions, copies recursively, and does not follow symbolic links.  permissions, copies recursively, and does not follow symbolic links.
99  <br>  <br>
100  <h1>Copying a Version of the SEED</h1>  <h2 id="copying">Copying a Version of the SEED</h2>
101    
102  To make a second copy of the SEED (either for a friend or for yourself), you should use tar  To make a second copy of the SEED (either for a friend or for yourself), you should use tar
103  to preserve a few symbolic links (which are relative, not absolute; this means that they can  to preserve a few symbolic links (which are relative, not absolute; this means that they can
# Line 88  Line 129 
129      <td>&nbsp;</td>      <td>&nbsp;</td>
130    </tr>    </tr>
131    <tr>    <tr>
132        <td><font face="Courier New, Courier, mono">bash</font></td>
133        <td>Switch to using the bash shell</td>
134      </tr>
135      <tr>
136      <td><font face="Courier New, Courier, mono">cd FIGdisk</font></td>      <td><font face="Courier New, Courier, mono">cd FIGdisk</font></td>
137      <td>&nbsp;</td>      <td>&nbsp;</td>
138    </tr>    </tr>
139    <tr>    <tr>
140      <td><font face="Courier New, Courier, mono">cp CURRENT_RELEASE DEFAULT_RELEASE</font></td>      <td height="23"><font face="Courier New, Courier, mono">cp CURRENT_RELEASE DEFAULT_RELEASE</font></td>
141      <td># Causes the new configuration to use the code that was running in the      <td># Causes the new configuration to use the code that was running in the
142        original installation</td>        original installation</td>
143    </tr>    </tr>
144      <tr>
145        <td height="23"><font face="Courier New, Courier, mono">./configure <em>arch-name</em></font></td>
146        <td># Configure the new SEED disk for architecture <em>arch-name</em>. </td>
147      </tr>
148      <tr>
149        <td height="23"><font face="Courier New, Courier, mono"> source config/fig-user-env.sh <br>
150        </font></td>
151        <td># Set up the environment for using the SEED</td>
152      </tr>
153      <tr>
154        <td height="23"><font face="Courier New, Courier, mono">start-servers <br>
155        </font></td>
156        <td># Start the database server and registration servers</td>
157      </tr>
158      <tr>
159        <td height="23"><font face="Courier New, Courier, mono">init_FIG <br>
160        </font></td>
161        <td># Initialize a new relational database</td>
162      </tr>
163      <tr>
164        <td height="23"><font face="Courier New, Courier, mono">fig load_all</font></td>
165        <td># Load the database from the SEED data files. This may take several hours</td>
166      </tr>
167  </table>  </table>
168  <p>At this point, the newly-copied FIGdisk can be configured for use. The full  <p>At this point, the new SEED copy should be ready to use. You only need to
169    documentation for SEED installation can currently be found at the following    perform the configure, init_FIG, and fig load_all steps once after installing
170      a new copy of the SEED. After a reboot or other clean start of the computer,
171      you will only have to do these steps:</p>
172    <table border="1" bgcolor="#EEEEEE">
173      <tr>
174        <td width="403"><font face="Courier New, Courier, mono">cd ~fig/FIGdisk</font></td>
175        <td width="285">&nbsp;</td>
176      </tr>
177      <tr>
178        <td><font face="Courier New, Courier, mono">bash</font></td>
179        <td>Switch to using the bash shell</td>
180      </tr>
181      <tr>
182        <td height="23"><font face="Courier New, Courier, mono"> source config/fig-user-env.sh <br>
183        </font></td>
184        <td># Set up the environment for using the SEED</td>
185      </tr>
186      <tr>
187        <td height="23"><font face="Courier New, Courier, mono">start-servers <br>
188        </font></td>
189        <td># Start the database server and registration servers</td>
190      </tr>
191    </table>
192    <p>Upon setting up a new computer for running SEED, you should read the full
193      documentation for SEED installation, as it has a number of platform-specific
194      modifications that need to be performed. This document can currently be found
195      at the following
196    location in the SEED Wiki:  </p>    location in the SEED Wiki:  </p>
197  <blockquote>  <blockquote>
198    <p><a href="http://www-unix.mcs.anl.gov/SEEDWiki/moin.cgi/SeedInstallationInstructions">      http://www-unix.mcs.anl.gov/SEEDWiki/moin.cgi/SeedInstallationInstructions</a></p>    <p><a href="http://www-unix.mcs.anl.gov/SEEDWiki/moin.cgi/SeedInstallationInstructions">      http://www-unix.mcs.anl.gov/SEEDWiki/moin.cgi/SeedInstallationInstructions</a></p>
199  </blockquote>  </blockquote>
200  <h1>Running Multiple Copies of the SEED</h1>  <h2 id="multiple_copies">Running Multiple Copies of the SEED</h2>
201    
202  For individual users that use the SEED to support comparative analysis, a single copy is completely  For individual users that use the SEED to support comparative analysis, a single copy is completely
203  adequate.  Adding genomes can usually be done without disrupting normal use, and a very occasional major  adequate.  Adding genomes can usually be done without disrupting normal use, and a very occasional major
# Line 113  Line 207 
207  effort.  In this case, you have a user community that is sensitive to disruptions of service, and you  effort.  In this case, you have a user community that is sensitive to disruptions of service, and you
208  have frequent demands to update versions of data.  In this case, it is best to have two systems: the  have frequent demands to update versions of data.  In this case, it is best to have two systems: the
209  <b>production system</b> is used to support the larger user community, and the <b>update system</b> is  <b>production system</b> is used to support the larger user community, and the <b>update system</b> is
210  used to prepare updated versions of the system.  Even so, work stoppages of 2-5 hours will occur when  used to prepare updated versions of the system.
211  new releases are swapped in.  To swap in new data from the update system to the production system,  New genomes are added to the update system, and then periodically a
212  you need to  revised Data directory is extracted to update the production system.
213    Even so, work stoppages of a few hours will occur when
214    new releases are swapped in.
215    <p>
216    This use of an "update" and a "production" system is quite analogous
217    to running a production system which is occasionally updated from new
218    Data DVDs (which FIG normally makes available about every 4-6 months).
219    That is, in both cases you are updating a production system from a
220    newly created <b>Data</b> directory that is lacking assignments and
221    annotations that exist on your production system.  However, if you have
222    added new genomes to the production system (that are not part of the
223    releases you may acquire via DVDs), you should get the new release,
224    install the versions of your local genomes, and then do this update
225    procedure.
226    <p>
227    The plan we propose is to build a completely encapsulated new version
228    of the system, then capture updates from the old production system, update
229    the new production system, and then make the new version the actual
230    production system.  This last step amounts to altering a symbolic link
231    to point at the new production system rather than the old.  This has
232    the virtue of ease of recovery -- that is, if something goes wrong you
233    can flip back to the old system.
234    The actual steps are as follows:
235  <ol>  <ol>
236  <li>stop all work on the production machine,  
237  <li>do a peer-to-peer update from the production machine to the update machine to  <li> First, make sure that you are in the BASH shell by typing "echo $SHELL";
238  capture all annotations and assignments,     if the result is not "bash", type "bash" to enter the BASH shell.
239  <li> move the Data directory in the production machine to a backup location,  
240  <li> move in a copy of the update Data directory, and  <li> Next, check that the result of typing "which perl" is the version
241  <li> run     of perl owned by the SEED; it should look something like
242       <pre>
243           /Users/fig/FIGdisk/env/mac/bin/perl
244       </pre>
245       although the exact results will depend on where your existing copy
246       of the SEED is installed, whether your platform is a Macintosh or LINUX,
247       etc. If the result does not look similar to the above, type:
248       <pre>
249           source Path_to_FIGdisk/config/fig-user-env.sh
250       </pre>
251       to setup your FIG environment properly.
252    
253    <li> Next, make a copy of the Code Distribution Environment (from a DVD
254    or via the network).  Suppose that we have made such a directory in
255    CodeDistEnv.  Then use,
256    <pre>
257            cd CodeDistEnv
258            ./install-code TargetDirectory
259    </pre>
260    where <b>TargetDirectory</b> is where you wish to build the new
261    production version.  We recommend calling it something like
262    <b>FIGdisk.July24</b>.
263    
264    <li> Stop all work on the production machine for the duration of the update.
265         You do this by clicking on the "Seed Control Panel" link,
266         and then entering an explanatory message in the text box
267         and clicking on the "Disable SEED server" button.
268    
269    <li> You now need to capture the assignments, annotations and
270         subsystems work that has been done on the production machine.
271         To do this, you need to know when the last production release
272         was installed.  Suppose that it was July 1, 2004.
273         If that was the date, we recommend that you run
274         <pre>
275            <b>extract_data_for_syncing_after_update 7/1/2004 /tmp/sync.data.july.1.2004</b>
276         </pre>
277    
278         This will capture your updates and save them in the directory
279         /tmp/sync.data.july.1.2004.<br>
280    
281    <li>Now, you need to stop the existing production system using
282    <pre>
283            ~/FIGdisk/bin/stop-servers
284    </pre>
285    
286    <li>Now, you need to configure the runtime environment for the system
287    you are running on.
288    To do this, use
289    <pre>
290            cd TargetDirectory
291            ./configure MacOrLinux
292    </pre>
293    where <b>MacOrLinux</b> must be a currently supported environment.
294    Those that are supported on July 24, 2004 are <b>mac</b> for
295    Macintoshes running panther, <b>mac-jaguar</b> for those that have not
296    upgraded to panther, and <b>linux-postgres</b>.
297    
298    <li>Now, you need to insert the new Data directory into the newly
299    constructed version of the SEED.  To do this use
300  <pre>  <pre>
301            chmod -R 777 TheNewData
302            cd TargetDirectory/FIG
303            ln -s TheNewData Data
304    </pre>
305    where TheNewData is the new Data directory, which normally comes  from the
306    update system.  If you acquired a new Data directory via Data DVDs, you
307    will need to unpack them using the README instructions, but what
308    results is a new version of the <b>Data</b> directory.
309    
310    <li>Now, you need to start the servers in order to load the databases
311    with the new release using
312    <pre>
313            cd TargetDirectory/bin
314            ./start-servers
315            cd ..
316            source config/fig-user-env.sh
317            init_FIG
318          fig load_all          fig load_all
319  </pre>  </pre>
320  to reload the production databases with the data from the newly inserted Data directory.  This last command will run for several hours.
321  This will usually take several hours.  
322  <li> make the production machine available for use.  <li> Now, you need to capture the changes made to the old production
323         version using something like
324         <pre>
325             <b>sync_new_system /tmp/sync.data.july.1.2004 make-assignments</b>
326         </pre>
327    <li>Run
328    <pre>
329            index_annotations
330            index_subsystems
331            make_indexes
332    </pre>
333    
334    <li> Now, finally, you should alter the symbolic link in <i>~fig</i> to
335    the current FIGdisk using something like:
336    <pre>
337            cd ~fig
338            rm FIGdisk     # should be removing a symbolic link to the current SEED
339            ln -s TargetDirectory FIGdisk
340    </pre>
341    That should make the new SEED the one available through the Web interface.
342    
343    <li> You should now bring your update system to the same state as the
344         production system.  This can be done by making sure that
345         <b>/tmp/sync.data.july.1.2004</b> is accessible to the update system.
346         If the production and update systems are run on the same machine, then
347         the directory is already there.  If not, copy it to <b>/tmp</b> on the
348         update machine.  Then run
349         <br>
350         <pre>
351             <b>sync_new_system /tmp/sync.data.july.1.2004 make-assignments</b>
352         </pre>
353         <br>
354         on the update machine.
355  </ol>  </ol>
356    <p>
357    
358  Our experience is that anytime a group wishes to share a common production environment,  Our experience is that anytime a group wishes to share a common production environment,
359  this 2-system approach is the way to do it.  this 2-system approach is the way to do it.  You can, if necessary,
360    put both systems on the same physical machine.  This does require some
361    special handling in setting up two different <b>FIGdisk</b>
362    directories.  We recommend using <b>FIGdisk.production</b> and
363    <b>FIGdisk.update</b>.  However, in general it makes sense to use two
364    separate physical machines, for backup if nothing else.  The update
365    system can usually be run on a $2000 (or less) box, although it is
366    desirable to spend a little more and get at least 1 gigabyte of main
367    memory and 200 gigabytes of external disk.
368  <br>  <br>
369  <h1>Adding a New Genome to an Existing SEED</h1>  <h2 id="adding_genomes">Adding a New Genome to an Existing SEED</h2>
370  To add a new genome to a running SEED is fairly easy, but there are a  To add a new genome to a running SEED is fairly easy, but there are a
371  number of details that do have to be handled with care.  number of details that do have to be handled with care.
372  <p>  <p>
# Line 385  Line 618 
618  The <i>add_genome</i> request will add your new genome and queue a computational request that similarities  The <i>add_genome</i> request will add your new genome and queue a computational request that similarities
619  be computed for the protein-encoding genes.  be computed for the protein-encoding genes.
620    
621  <h1>Computing Similarities</h1>  <h2 id="sims">Computing Similarities</h2>
622    
623  Adding a genome does not automatically get similarities computed for the new genome; it queues the request.  Adding a genome does not automatically get similarities computed for the new genome; it queues the request.
624  To get the similarities actually computed, you need to establish a computational environment on which  To get the similarities actually computed, you need to establish a computational environment on which
# Line 436  Line 669 
669  interfaces for handling high-volume requests.  At FIG, we will maintain a set of instructions on how to set up  interfaces for handling high-volume requests.  At FIG, we will maintain a set of instructions on how to set up
670  your configuration to exploit these resources.  your configuration to exploit these resources.
671    
672  <h1>Deleting Genomes from a Version of the SEED </h1>  <h2 id="deleting_genomes">Deleting Genomes from a Version of the SEED</h2>
673    
674  There are two common instances in which one wishes to delete genomes from a running version of the SEED: one is  There are two common instances in which one wishes to delete genomes from a running version of the SEED: one is
675  when you wish to replace an existing version of a genome (in which case the replacement is viewed as first  when you wish to replace an existing version of a genome (in which case the replacement is viewed as first
# Line 472  Line 705 
705  <pre>  <pre>
706          make_a_SEED /Users/fig/FIGdisk /Volumes/Tmp/ExtractedData /Volumes/MyFriend/FIGdisk.ReadyToGo          make_a_SEED /Users/fig/FIGdisk /Volumes/Tmp/ExtractedData /Volumes/MyFriend/FIGdisk.ReadyToGo
707          rm -rf /Volumes/Tmp/ExtractedData          rm -rf /Volumes/Tmp/ExtractedData
 <<<< Bob, can you write make_a_SEED??? >>>  
708  </pre>  </pre>
709    
710  <h1>Periodic Reintegration of Similarities</h1>   <h2 id="reintegrate_sims">Periodic Reintegration of Similarities</h2>
711    
712  When the initial SEED was constructed, similarities were computed.  For most similarities of the form  When the initial SEED was constructed, similarities were computed.  For most similarities of the form
713  "Id1 and Id2 are similar", entries were "recorded" for both Id1 and Id2.  This is not always true,  "Id1 and Id2 are similar", entries were "recorded" for both Id1 and Id2.  This is not always true,
# Line 493  Line 725 
725  </pre>  </pre>
726  The job will probably run for quite a while (perhaps as much as a day or two).  The job will probably run for quite a while (perhaps as much as a day or two).
727    
728  <h1>Computing "Pins" and "Clusters"</h1>  <h2 id="pins_and_clusters">Computing "Pins" and "Clusters"</h2>
729    
730  The SEED displays potentially significant clusters on prokaryotic chromosomes.  In the  The SEED displays potentially significant clusters on prokaryotic chromosomes.  In the
731  process of finding preserved contiguity, it computes "pins", which are simply a set of genes  process of finding preserved contiguity, it computes "pins", which are simply a set of genes

Legend:
Removed from v.1.1  
changed lines
  Added in v.1.9

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3