[Bio] / FigTutorial / why_use_the_SEED.html Repository:
ViewVC logotype

View of /FigTutorial/why_use_the_SEED.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (download) (as text) (annotate)
Sun Jun 19 11:35:18 2005 UTC (14 years, 5 months ago) by overbeek
Branch: MAIN
CVS Tags: rast_rel_2014_0912, rast_rel_2008_06_18, rast_rel_2008_06_16, rast_rel_2008_07_21, rast_rel_2010_0928, rast_2008_0924, rast_rel_2008_09_30, caBIG-13Feb06-00, rast_rel_2010_0526, rast_rel_2014_0729, merge-trunktag-bobdev_news-1, rast_rel_2009_05_18, caBIG-05Apr06-00, rast_rel_2009_0925, rast_rel_2010_1206, rast_rel_2010_0118, caBIG-00-00-00, rast_rel_2009_02_05, rast_rel_2011_0119, rast_rel_2008_12_18, merge-trunktag-bodev_news-3, rast_rel_2008_10_09, rast_release_2008_09_29, rast_rel_2008_04_23, rast_rel_2008_08_07, rast_rel_2009_07_09, rast_rel_2010_0827, myrast_33, rast_rel_2011_0928, rast_rel_2008_09_29, rast_rel_2008_10_29, rast_rel_2009_03_26, merge-trunktag-bobdev_news-2, rast_rel_2008_11_24, HEAD
Branch point for: Branch-bobdev_news
add a tutorial on finding relevant links

<h1>Why Use the SEED (a VERY basic tutorial)?</h1>

<h2>Introduction</h2>

Many of us think that the SEED is a very rich environment for studying
genomic data.  Indeed, we think that it offers many features
unavalable through other systems.  However, up to now it has always
been viewed as a system that was almost impossible to use without
extensive guidance.  In this tutorial, I argue that there is actually
a very small subset of the overall functionality that is very useful,
and that subset can be learned in a very short time with relatively
little effort.  The functionality that you need to learn involves four
steps: 
<ol>
<li><b>Finding a specific gene</b>.  Suppose that you know a gene that
you wish to study.  You may have a gene name from a research article,
an ID from a genomic database, or a piece of sequence.  Whatever the
starting point, you need to learn how to locate the SEED protein page
corresponding the the gene or protein you are interested in.
Hopefully, we can convey the basic steps that will get you there in
about 10 minutes of tutorial or less.

<li><b>Finding similar genes that occur in clusters on prokaryotic
genomes</b>.  Functionally related genes tend to cluster on
prokaryotic genomes.  In most prokaryotes 50% or more of the genes are
clustered with related genes.  For any gene thagt you wish to study,
either it will occur in a cluster, or there will be a corresponding
gene in another genome that does occur in a cluster (this is, of
course, an overstatement; but it is essentially true).  Once you have
<i>located the gene/protein in the SEED</i>, the next step is to
<i>find the relevant clusters</i>.  It should take only about 5
minutes to learn how to do that.

<li><b>Getting a display that shows the relevant clusters in a number
of genomes</b>.  Once you have a cluster that includes a set of
functionally related genes, you need to get a visual overview of
different versions of this cluster as it exists in other sequenced
genomes.  It should take less than five minutes for you to figure out
how to do this.

<li><b>Finally, you need to study these clusters in the visual
display</b>.  This is an endlessly satisfying experience, so it is
pointless to think of a minimal time required to perform the task.

</ol>
There are many, many things that you cannot do with just these four
steps, but the functionality provided (locating relevant clusters of
genes) is a capability that is far more important than you might
realize.  And, this is the easiest way to do it.
<br><br>
In the rest of this tutorial, we will cover these four steps.

<h2> Step 1: Finding the Gene/Protein You Want to Study</h2>

Go to the <a href=http://theseed.uchicago.edu/FIG/index.cgi target=tutorial>initial page of the SEED</a>.
<br><br>
First, fill in your ID.  Use something of the for <b>master:FirstL</b>,
where "FirstL" should be your first name and the first initial of your
last name.  You can use anything you wish, but do try to make it
descriptive and unique.
<br><br>
If you have one or more keywords (e.g., <b>dnaK</b> or
<b>gi|23016701</b>), you put them in the <b>Search Pattern:</b> field
and click on <b>Search</b>.
<br><br>
If you get a list of matched <i>protein-encoding genes</i>, you can
take any of the links to a specific gene that meets your criteria.
<br><br>
Do this now for <b>gi|23016701</b>, and verify that you can get to the
gene/protein page.
<br><br>
Now suppose that you wanted to find <i>dnaK</i> in <i>Bacillus
subtilis</i>.
To do this, fill ib the search pattern with <i>dnaK</i>, select the
organism using the pull-down menu, and click on <b>Search genome
selected below</b>.
<br><br>
Verify that you can actually get to the gene/protein page for
<i>dnaK</i>.
<br><br>
Now, suppose that you have a piece of DNA or protein sequence, and you
wish to find the genes within a genome that contain the same or
similar sequences.  You can do this quite simply. First, patch your
sequence into the provided text window.  Then select <b>blastp</b> if the
provided sequence was a protein sequence or <b>blastn</b> if the
provided sequence was DNA.  Finally, select the organism you wish
to search from the pull-down menu.
Then click on <b>Search for Matches</b>.
You should get blast output, with links set to get you to the desired
gene/protein page.
<br><br>
You should now verify that "NDAERQATKDAGKIAGLEVERIINEPTAAALAYGLDKT" could be used to
locate <i>dnaK</i> in <i>Bacillus subtilis</i>.
<br><br>
That ends our 10-minute discussion of how to find the gene/protein you
are interested in.  Clearly, there is much more that could be said
about how to use the SEED search facilities, but this should cover
the vast majority of your search needs.

<h2>Step 2: Finding similar genes that occur in clusters on prokaryotic genomes</h2>

Suppose that you have found a desired gene/protein page.  We have not
told you how to interpret it.  Nor do we intend to.  It is a page full
of information, links, and possible services.  Our strategy in this
simple tutorial is to just show you how to find <i>relevant clusters</i> of
genes, by which we mean clusters of functionally related genes that
include either the gene you are "positioned on" or a corresponding
gene in another organism.
<br><br>
First, position yourself on the gene/protein page for
<b>gi|21283241</b>.
<br><br>
The table at the top of the page describes the genes in the region of
the chromosome surrounding the gene you are positioned on
(<i>fig|196620.1.peg.1512</i>, which is the SEED ID for the gene
encoding <i>gi|21283241</i>).  The entry for the gene you are
positioned on is shown in green.  Just below the table is a small
graphical display of the region.  The gene you are positioned on is
shown in green.  Genes that are believed to be "functionally related"
(based on the fact that they occur close to each other in a number of
genomes) is shown as blue.  Others are red.
<br><br>
It so happens that the gene you are positioned on is in a cluster.
The cluster contains 7 genes.  Each of the genes in the cluster has a
little <b>Pins</b> link to the side.  
<br><br>
To find any larger clusters (occuring in other genomes) that contain
genes similar to the one you are positioned on, you can click on the
<b>CL</b> link just to the left of the gene.  Which genomes contain
larger clusters?  Were you able to locate the corresponding gene in 
<i>Bacillus subtilis subsp. subtilis str. 168</i> or in <i>Bacillus
cereus ATCC 14579</i>?  In each of those genomes the cluster is
slightly larger.
<br><br>
Note that you can find these largest cluster, even when you are on a
gene that is not in a cluster (or even one from a eukaryotic genome).

<h2>Step 3: Getting a display that shows the relevant clusters in a number of genomes</h2>

Once you are positioned on a gene in a cluster (which may or may not
be one of the largest clusters), you should click on the <b>Pins</b>
button just to the left of the shaded green area.  Try it.
<br><br>
In a separate window, you should see a portrayal of different versions
of the same (or closely related) clusters as they occur in other
genomes.  The red genes are aligned in the center of the page, and
then all of the genes around this central "pin" are shown.  Similar
genes will have the same color.  You should be able to mouse-over
genes in the display and see the functions of the genes.
Finally, if you choose to click on the <b>Commentary</b> button,
another window will pop up containing information about each
of the colored sets of genes.

<h2> An Exercise: Do Clusters Really Mean Anything?</h2>

Pick a pathway from central metabolism (i.e., a pathway that you know
exists in several organisms).  Then pick a gene from that pathway in an
organism that you know has the gene.  Now, find the gene/protein page
corresponding to the gene.  
<br><br>
Now, the question we pose is <i>"Can you now find large clusters for
the gene, and if you can do the large clusters contain other
functional roles from the same pathway?"</i>
<br><br>
If you perform this exercise ten times, you should get a pretty
accurate feel for why we believe the study of gene clusters is of
central importance.

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3