Why Use the SEED (a VERY basic tutorial)?
Many of us think that the SEED is a very rich environment for studying
genomic data. Indeed, we think that it offers many features
unavalable through other systems. However, up to now it has always
been viewed as a system that was almost impossible to use without
extensive guidance. In this tutorial, I argue that there is actually
a very small subset of the overall functionality that is very useful,
and that subset can be learned in a very short time with relatively
little effort. The functionality that you need to learn involves four
There are many, many things that you cannot do with just these four
steps, but the functionality provided (locating relevant clusters of
genes) is a capability that is far more important than you might
realize. And, this is the easiest way to do it.
- Finding a specific gene. Suppose that you know a gene that
you wish to study. You may have a gene name from a research article,
an ID from a genomic database, or a piece of sequence. Whatever the
starting point, you need to learn how to locate the SEED protein page
corresponding the the gene or protein you are interested in.
Hopefully, we can convey the basic steps that will get you there in
about 10 minutes of tutorial or less.
- Finding similar genes that occur in clusters on prokaryotic
genomes. Functionally related genes tend to cluster on
prokaryotic genomes. In most prokaryotes 50% or more of the genes are
clustered with related genes. For any gene thagt you wish to study,
either it will occur in a cluster, or there will be a corresponding
gene in another genome that does occur in a cluster (this is, of
course, an overstatement; but it is essentially true). Once you have
located the gene/protein in the SEED, the next step is to
find the relevant clusters. It should take only about 5
minutes to learn how to do that.
- Getting a display that shows the relevant clusters in a number
of genomes. Once you have a cluster that includes a set of
functionally related genes, you need to get a visual overview of
different versions of this cluster as it exists in other sequenced
genomes. It should take less than five minutes for you to figure out
how to do this.
- Finally, you need to study these clusters in the visual
display. This is an endlessly satisfying experience, so it is
pointless to think of a minimal time required to perform the task.
In the rest of this tutorial, we will cover these four steps.
Step 1: Finding the Gene/Protein You Want to Study
Go to the initial page of the SEED.
First, fill in your ID. Use something of the for master:FirstL,
where "FirstL" should be your first name and the first initial of your
last name. You can use anything you wish, but do try to make it
descriptive and unique.
If you have one or more keywords (e.g., dnaK or
gi|23016701), you put them in the Search Pattern: field
and click on Search.
If you get a list of matched protein-encoding genes, you can
take any of the links to a specific gene that meets your criteria.
Do this now for gi|23016701, and verify that you can get to the
Now suppose that you wanted to find dnaK in Bacillus
To do this, fill ib the search pattern with dnaK, select the
organism using the pull-down menu, and click on Search genome
Verify that you can actually get to the gene/protein page for
Now, suppose that you have a piece of DNA or protein sequence, and you
wish to find the genes within a genome that contain the same or
similar sequences. You can do this quite simply. First, patch your
sequence into the provided text window. Then select blastp if the
provided sequence was a protein sequence or blastn if the
provided sequence was DNA. Finally, select the organism you wish
to search from the pull-down menu.
Then click on Search for Matches.
You should get blast output, with links set to get you to the desired
You should now verify that "NDAERQATKDAGKIAGLEVERIINEPTAAALAYGLDKT" could be used to
locate dnaK in Bacillus subtilis.
That ends our 10-minute discussion of how to find the gene/protein you
are interested in. Clearly, there is much more that could be said
about how to use the SEED search facilities, but this should cover
the vast majority of your search needs.
Step 2: Finding similar genes that occur in clusters on prokaryotic genomes
Suppose that you have found a desired gene/protein page. We have not
told you how to interpret it. Nor do we intend to. It is a page full
of information, links, and possible services. Our strategy in this
simple tutorial is to just show you how to find relevant clusters of
genes, by which we mean clusters of functionally related genes that
include either the gene you are "positioned on" or a corresponding
gene in another organism.
First, position yourself on the gene/protein page for
The table at the top of the page describes the genes in the region of
the chromosome surrounding the gene you are positioned on
(fig|196620.1.peg.1512, which is the SEED ID for the gene
encoding gi|21283241). The entry for the gene you are
positioned on is shown in green. Just below the table is a small
graphical display of the region. The gene you are positioned on is
shown in green. Genes that are believed to be "functionally related"
(based on the fact that they occur close to each other in a number of
genomes) is shown as blue. Others are red.
It so happens that the gene you are positioned on is in a cluster.
The cluster contains 7 genes. Each of the genes in the cluster has a
little Pins link to the side.
To find any larger clusters (occuring in other genomes) that contain
genes similar to the one you are positioned on, you can click on the
CL link just to the left of the gene. Which genomes contain
larger clusters? Were you able to locate the corresponding gene in
Bacillus subtilis subsp. subtilis str. 168 or in Bacillus
cereus ATCC 14579? In each of those genomes the cluster is
Note that you can find these largest cluster, even when you are on a
gene that is not in a cluster (or even one from a eukaryotic genome).
Step 3: Getting a display that shows the relevant clusters in a number of genomes
Once you are positioned on a gene in a cluster (which may or may not
be one of the largest clusters), you should click on the Pins
button just to the left of the shaded green area. Try it.
In a separate window, you should see a portrayal of different versions
of the same (or closely related) clusters as they occur in other
genomes. The red genes are aligned in the center of the page, and
then all of the genes around this central "pin" are shown. Similar
genes will have the same color. You should be able to mouse-over
genes in the display and see the functions of the genes.
Finally, if you choose to click on the Commentary button,
another window will pop up containing information about each
of the colored sets of genes.
An Exercise: Do Clusters Really Mean Anything?
Pick a pathway from central metabolism (i.e., a pathway that you know
exists in several organisms). Then pick a gene from that pathway in an
organism that you know has the gene. Now, find the gene/protein page
corresponding to the gene.
Now, the question we pose is "Can you now find large clusters for
the gene, and if you can do the large clusters contain other
functional roles from the same pathway?"
If you perform this exercise ten times, you should get a pretty
accurate feel for why we believe the study of gene clusters is of