[Bio] / FigTutorial / SEED_administration_issues.html Repository:
ViewVC logotype

Diff of /FigTutorial/SEED_administration_issues.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.14, Mon Nov 8 19:13:58 2004 UTC revision 1.15, Tue Jan 25 22:57:46 2005 UTC
# Line 661  Line 661 
661    
662  <h2 id="sims">Computing Similarities</h2>  <h2 id="sims">Computing Similarities</h2>
663    
664  Adding a genome does not automatically get similarities computed for the new genome; it queues the request.  Adding a genome does not automatically get similarities computed for the new genome.
665  To get the similarities actually computed, you need to establish a computational environment on which  To get the similarities actually computed, you need to compute them and make them available in
666  the blast runs will be made, and then initiate a request on the machine running the SEED.  the <b>FIGdisk/FIG/Data/NewSims</b> directory.
667  <p>  <p>
668  This is not a completely trivial process because there are a variety of different ways to compute  To compute similarities, you will need to do the following:
 similarities:  
669  <ol>  <ol>
670  <li> You can just compute them on the system running the SEED.  This can take several days, but this  <li>The translations of the set of PEGs in your new genome (i.e., genome 562.4) should be in
671  is often a perfectly reasonable way to get the job done.  <b>~fig/FIGdisk/FIG/Data/Organisms/562.4/Features/peg/fasta</b>.  A copy of this was appended to
672  <li>Alternatively, you may be in an environment where you have a set of networked machines (say, 4-5 machines),  <b>~fig/FIGdisk/FIG/Data/Global/nr</b> when your genome was added.  <b>nr</b> is the "nonredundant database"
673  and you wish to just exploit these machines to do the blast runs.  we use to compute similarities (and the one you must use).  To get the initial blast results, you would use something
674  <li> Finally, you may be dealing with a large genome or genomes (and, hence, the need for many days of computation).  like
 In this case, it makes sense to utilize a large computational resource, and this resource may either  
 be a local cluster or a service provided over the net.  
 </ol>  
675  <br>  <br>
676  To establish the flexibility needed to support all of these alternatives, we implemented the following  <pre>
677  approach:            blastall -i ~fig/FIGdisk/FIG/Data/Organisms/562.4/Features/peg/fasta -d ~fig/FIGdisk/FIG/Data/Global/nr -m 8 -FF -p blastp | reduce_sims ~fig/FIGdisk/FIG/Data/Global/peg.synonyms 300 > reduced.sims
678  <ul>  </pre>
 <li>  
 The user can describe one or more <b>similarity computational environments</b>  
 in a configuration file called <i>similarities.config</i>.  The details of this encoding  
 are beyond the scope of this document.  
 These environments all represent potential ways to compute similarities.  
679  <br>  <br>
680    which produces the blast results in a tab-separated format.  The invocation of <b>reduce_sims</b> is optional.
681    It has the effect of limited the retained similarities for each PEG to 300, with a truncation approach that attempts to preserve at least one similarity against each other genome (i.e., the trimming is selective).
682  <li>  <li>
683  When a SEED systems administrator (usually, the normal SEED user) wishes to run similarities,  The output of blastall lacks 2 columns that we need -- columns containing the length of each of the similar sequences.  To add that, you would use
684  he runs a program specifying a specific similarity computational environment.  This causes all  <br>
685  the queued similarity requests to be batched up and sent off to the specified server (which may simply  <pre>
686  be on the same machine).  He would use the <b>generate_similarities</b> command specifying two parameters: the          reformat_sims ~fig/FIGdisk/FIG/Data/Global/nr < reduced_sims > ~fig/FIGdisk/FIG/Data/NewSims/sims.for.562.4
687  first specifies a similarities computational environment, and the second specifies whether or not automated assignments  </pre>
688  should be computed as the similarity computations complete and the results are installed.  <br>
689  As the similarities complete, they will automatically be installed.  Further, if a set of similarities arrive  This will actually append two columns to each similarity and place the results in the <b>NewSims</b>
690  for a given protein-encoding gene, and if there is no current assignment of function for the gene,  directory where it should be.
691  an automated assignment may be computed.  Whether or not such automated assignments are computed is determined  </ol>
 by the second parameter in the command used by the systems administrator to initiate the request.  For example,  
 <pre>  
         generate_similarities local auto-assignments  
 </pre>  
 specifies a similarity computational environment labeled <i>local</i>, which presumably means "run the blast  
 requests on this machine", and requests automated assignments for all protein-encoding genes that currently either  
 have no assigned function or have an assigned function that is "hypothetical".  
 </ul>  
 <br>  
   
 We anticipate that at least one major center (Argonne National Lab) and, perhaps, more will create well-defined  
 interfaces for handling high-volume requests.  At FIG, we will maintain a set of instructions on how to set up  
 your configuration to exploit these resources.  
692  <p>  <p>
693  No matter how you produce the new similarities, they need to be added  No matter how you produce the new similarities, they need to be added
694  as a file in the <b>FIGdisk/FIG/Data/NewSims</b> directory.  Then, you  as a file in the <b>FIGdisk/FIG/Data/NewSims</b> directory.  Then, you

Legend:
Removed from v.1.14  
changed lines
  Added in v.1.15

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3