[Bio] / FigTutorial / SEED_administration_issues.html Repository:
ViewVC logotype

Diff of /FigTutorial/SEED_administration_issues.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.16, Tue Jan 25 23:01:46 2005 UTC revision 1.17, Tue Apr 26 15:08:26 2005 UTC
# Line 23  Line 23 
23  Adding a New Genome to an Existing SEED  Adding a New Genome to an Existing SEED
24  </A>  </A>
25    
26    <li><A HREF="#importing_external">
27    Importing External Protein Data
28    </A>
29    
30  <li><A HREF="#sims">  <li><A HREF="#sims">
31      Computing Similarities      Computing Similarities
32  </A>  </A>
# Line 659  Line 663 
663  The <i>add_genome</i> request will add your new genome and queue a computational request that similarities  The <i>add_genome</i> request will add your new genome and queue a computational request that similarities
664  be computed for the protein-encoding genes.  be computed for the protein-encoding genes.
665    
666    <h2 id="importing_external">Importing External Protein Data</h2>
667    
668    The presence of external judgements about the possible functions of encoded proteins
669    is one of the essential aspects of the SEED.  It becomes important that one be able to
670    add new sources of annotation, as well as periodically updating the judgements of
671    existing sources.  To update the external sets of proteins and annotations, build a new nonredundant
672    database of proteins, and compute the associated similarities, one should proceed as follows:
673    
674    <ol>
675    <li> Stop using the system until this procedure completes.
676    <br><br>
677    <li> Update the NR Directory
678    <br><br>
679    The <b>NR</b> directory is located within the <b>Data</b> directory:
680    <br>
681    <pre>
682            ~fig                                      on a Mac: /Users/fig; on Linux: /home/fig
683                    FIGdisk
684                            dist                      source code
685                            FIG
686                                    Tmp               temporary files
687                                    Data              data in readable form
688                                              NR      Contains external Data
689    
690    </pre>
691    
692    The <b>NR</b> directory contains one subdirectory for each source of external
693    assignments (the released SEED includes subdirectories for SwissProt, NCBI, UniProt, and KEGG).
694    You may add more subdirectories.
695    <p>
696    Each subdirectory must include 3 files:
697    <ol>
698    <li> <b>fasta</b> should be a fasta file containing the protein sequences.  These sequences will
699    be used to establish a correspondence between these IDs and other protein sequences within the SEED.
700    <br><br>
701    <li> <b>org.table</b> is a two-column, tab-separated table.  Column 1 is the ID, and column 2 is the
702    organism corresponding to the ID.
703    <br><br>
704    <li> <b>assign_functions</b> is a 2-column table.  The ID is in column 1, and column 2 contains the
705    gene function (often called a <i>product name</i>) asserted by the external source.
706    </ol>
707    <br>
708    You should proceed only when you have updated as many of the sources as you wish.
709    <br><br>
710    <li> Now run
711    <pre>
712           import_external_sequences_step1
713    </pre>
714    
715    This program will build a new nonredundant database, check to see what has changed, and will
716    build the input required to compute new similarities.
717    <br><br>
718    <li> Compute the needed similarities
719    
720    You will need three files to compute a new batch of similarities.  The locations of these
721    three files are displayed by <b>import_external_sequences_step1</b> just before completion
722    (i.e., you should have gotten them as the output of the last step).  Compute the similarities (see
723    the discussion below) and store them in the <b>NewSims</b> directory (again the precise location
724    was displayed by <b>import_external_sequences_step1</b>).
725    <br><br>
726    <li> Run
727    <pre>
728           import_external_sequences_step3
729    </pre>
730    </ol>
731    
732  <h2 id="sims">Computing Similarities</h2>  <h2 id="sims">Computing Similarities</h2>
733    
734  Adding a genome does not automatically get similarities computed for the new genome.  Adding a genome does not automatically get similarities computed for the new genome.
# Line 873  Line 943 
943    
944  <li> Finally, install the automated assignments in the seed using the command  <li> Finally, install the automated assignments in the seed using the command
945  <pre>  <pre>
946      fig auto_assignF ~/Tmp/assigned_functions      fig assign_functionF master:automated_assignments  ~/Tmp/assigned_functions
947  </pre>  </pre>
948    
949  </ol>  </ol>
# Line 882  Line 952 
952  is quite simple and crude, being only slightly better than simply assigning  is quite simple and crude, being only slightly better than simply assigning
953  the function of the highest-scoring BLASTP hit; however, it at least provides  the function of the highest-scoring BLASTP hit; however, it at least provides
954  a "quick and dirty" starting point for making an initial assessment of a genome,  a "quick and dirty" starting point for making an initial assessment of a genome,
955  which may then be clraned up and refined by skilled genome annotators.  which may then be cleaned up and refined by skilled genome annotators.
956    
957    
958    

Legend:
Removed from v.1.16  
changed lines
  Added in v.1.17

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3