[Bio] / FigTutorial / SEED_administration_issues.html Repository:
ViewVC logotype

Diff of /FigTutorial/SEED_administration_issues.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.16, Tue Jan 25 23:01:46 2005 UTC revision 1.18, Wed Jul 19 17:36:16 2006 UTC
# Line 23  Line 23 
23  Adding a New Genome to an Existing SEED  Adding a New Genome to an Existing SEED
24  </A>  </A>
25    
26    <li><A HREF="#importing_external">
27    Importing External Protein Data
28    </A>
29    
30  <li><A HREF="#sims">  <li><A HREF="#sims">
31      Computing Similarities      Computing Similarities
32  </A>  </A>
# Line 659  Line 663 
663  The <i>add_genome</i> request will add your new genome and queue a computational request that similarities  The <i>add_genome</i> request will add your new genome and queue a computational request that similarities
664  be computed for the protein-encoding genes.  be computed for the protein-encoding genes.
665    
666    <h2 id="importing_external">Importing External Protein Data</h2>
667    
668    The presence of external judgements about the possible functions of encoded proteins
669    is one of the essential aspects of the SEED.  It becomes important that one be able to
670    add new sources of annotation, as well as periodically updating the judgements of
671    existing sources.  To update the external sets of proteins and annotations, build a new nonredundant
672    database of proteins, and compute the associated similarities, one should proceed as follows:
673    
674    <ol>
675    <li> Stop using the system until this procedure completes.
676    <br><br>
677    <li> Update the NR Directory
678    <br><br>
679    The <b>NR</b> directory is located within the <b>Data</b> directory:
680    <br>
681    <pre>
682            ~fig                                      on a Mac: /Users/fig; on Linux: /home/fig
683                    FIGdisk
684                            dist                      source code
685                            FIG
686                                    Tmp               temporary files
687                                    Data              data in readable form
688                                              NR      Contains external Data
689    
690    </pre>
691    
692    The <b>NR</b> directory contains one subdirectory for each source of external
693    assignments (the released SEED includes subdirectories for SwissProt, NCBI, UniProt, and KEGG).
694    You may add more subdirectories.
695    <p>
696    Each subdirectory must include 3 files:
697    <ol>
698    <li> <b>fasta</b> should be a fasta file containing the protein sequences.  These sequences will
699    be used to establish a correspondence between these IDs and other protein sequences within the SEED.
700    <br><br>
701    <li> <b>org.table</b> is a two-column, tab-separated table.  Column 1 is the ID, and column 2 is the
702    organism corresponding to the ID.
703    <br><br>
704    <li> <b>assign_functions</b> is a 2-column table.  The ID is in column 1, and column 2 contains the
705    gene function (often called a <i>product name</i>) asserted by the external source.
706    </ol>
707    <br>
708    You should proceed only when you have updated as many of the sources as you wish.
709    <br><br>
710    <li> Now run
711    <pre>
712           import_external_sequences_step1
713    </pre>
714    
715    This program will build a new nonredundant database, check to see what has changed, and will
716    build the input required to compute new similarities.
717    <br><br>
718    <li> Compute the needed similarities
719    
720    You will need three files to compute a new batch of similarities.  The locations of these
721    three files are displayed by <b>import_external_sequences_step1</b> just before completion
722    (i.e., you should have gotten them as the output of the last step).  Compute the similarities (see
723    the discussion below) and store them in the <b>NewSims</b> directory (again the precise location
724    was displayed by <b>import_external_sequences_step1</b>).
725    <br><br>
726    <li> Run
727    <pre>
728           import_external_sequences_step3
729    </pre>
730    </ol>
731    
732  <h2 id="sims">Computing Similarities</h2>  <h2 id="sims">Computing Similarities</h2>
733    
734  Adding a genome does not automatically get similarities computed for the new genome.  Adding a genome does not automatically get similarities computed for the new genome.
# Line 753  Line 823 
823  <p>  <p>
824  To delete a set of genomes from a running version of the SEED, just use  To delete a set of genomes from a running version of the SEED, just use
825  <pre>  <pre>
826          fig delete_genomes G1 G2 ...Gn  (where G1 G2 ... Gn designates a list of genomes)          fig mark_deleted_genomes User G1 G2 ...Gn  (where G1 G2 ... Gn designates a list of genomes)
827  </pre>  </pre>
828  For example,  For example,
829  <pre>  <pre>
830          fig delete_genomes 562.1          fig mark_deleted_genomes RossO 562.1
831  </pre>  </pre>
832  could be used to delete a single genome with a genome ID of 562.1.  could be used to delete a single genome with a genome ID of 562.1.
 <p>  
 To make a copy with some genomes deleted to give to someone else requires a little different approach.  
 To extract a set of genomes from an existing version of the SEED, you need to run the command  
 <pre>  
         extract_genomes Which ExistingData ExtractedData  
 </pre>  
   
 The first argument is either the word "unrestricted" or the name of a file containing a list of  
 genome IDs (the genomes that are to be retained in the extraction).  The second argument is  
 the path to the current Data directory.  The third argument specifies the name of a directory  
 that is created holding the extraction.  Thus,  
 <pre>  
         extract_genomes unrestricted ~/FIGdisk/FIG/Data /Volumes/Tmp/ExtractedData  
 </pre>  
 would created the extracted Data directory for you.  If you wish to then produce a fully distributable  
 version of the SEED from the existing version and the extracted Data directory, you would  
 use  
 <pre>  
         make_a_SEED ~/FIGdisk /Volumes/Tmp/ExtractedData /Volumes/MyFriend/FIGdisk.ReadyToGo  
         rm -rf /Volumes/Tmp/ExtractedData  
 </pre>  
833    
834   <h2 id="reintegrate_sims">Periodic Reintegration of Similarities</h2>   <h2 id="reintegrate_sims">Periodic Reintegration of Similarities</h2>
835    
# Line 873  Line 922 
922    
923  <li> Finally, install the automated assignments in the seed using the command  <li> Finally, install the automated assignments in the seed using the command
924  <pre>  <pre>
925      fig auto_assignF ~/Tmp/assigned_functions      fig assign_functionF master:automated_assignments  ~/Tmp/assigned_functions
926  </pre>  </pre>
927    
928  </ol>  </ol>
# Line 882  Line 931 
931  is quite simple and crude, being only slightly better than simply assigning  is quite simple and crude, being only slightly better than simply assigning
932  the function of the highest-scoring BLASTP hit; however, it at least provides  the function of the highest-scoring BLASTP hit; however, it at least provides
933  a "quick and dirty" starting point for making an initial assessment of a genome,  a "quick and dirty" starting point for making an initial assessment of a genome,
934  which may then be clraned up and refined by skilled genome annotators.  which may then be cleaned up and refined by skilled genome annotators.
935    
936    
937    

Legend:
Removed from v.1.16  
changed lines
  Added in v.1.18

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3