Why Use the SEED (a VERY basic tutorial)?


Many of us think that the SEED is a very rich environment for studying genomic data. Indeed, we think that it offers many features unavalable through other systems. However, up to now it has always been viewed as a system that was almost impossible to use without extensive guidance. In this tutorial, I argue that there is actually a very small subset of the overall functionality that is very useful, and that subset can be learned in a very short time with relatively little effort. The functionality that you need to learn involves four steps:
  1. Finding a specific gene. Suppose that you know a gene that you wish to study. You may have a gene name from a research article, an ID from a genomic database, or a piece of sequence. Whatever the starting point, you need to learn how to locate the SEED protein page corresponding the the gene or protein you are interested in. Hopefully, we can convey the basic steps that will get you there in about 10 minutes of tutorial or less.
  2. Finding similar genes that occur in clusters on prokaryotic genomes. Functionally related genes tend to cluster on prokaryotic genomes. In most prokaryotes 50% or more of the genes are clustered with related genes. For any gene thagt you wish to study, either it will occur in a cluster, or there will be a corresponding gene in another genome that does occur in a cluster (this is, of course, an overstatement; but it is essentially true). Once you have located the gene/protein in the SEED, the next step is to find the relevant clusters. It should take only about 5 minutes to learn how to do that.
  3. Getting a display that shows the relevant clusters in a number of genomes. Once you have a cluster that includes a set of functionally related genes, you need to get a visual overview of different versions of this cluster as it exists in other sequenced genomes. It should take less than five minutes for you to figure out how to do this.
  4. Finally, you need to study these clusters in the visual display. This is an endlessly satisfying experience, so it is pointless to think of a minimal time required to perform the task.
There are many, many things that you cannot do with just these four steps, but the functionality provided (locating relevant clusters of genes) is a capability that is far more important than you might realize. And, this is the easiest way to do it.

In the rest of this tutorial, we will cover these four steps.

Step 1: Finding the Gene/Protein You Want to Study

Go to the initial page of the SEED.

First, fill in your ID. Use something of the for master:FirstL, where "FirstL" should be your first name and the first initial of your last name. You can use anything you wish, but do try to make it descriptive and unique.

If you have one or more keywords (e.g., dnaK or gi|23016701), you put them in the Search Pattern: field and click on Search.

If you get a list of matched protein-encoding genes, you can take any of the links to a specific gene that meets your criteria.

Do this now for gi|23016701, and verify that you can get to the gene/protein page.

Now suppose that you wanted to find dnaK in Bacillus subtilis. To do this, fill ib the search pattern with dnaK, select the organism using the pull-down menu, and click on Search genome selected below.

Verify that you can actually get to the gene/protein page for dnaK.

Now, suppose that you have a piece of DNA or protein sequence, and you wish to find the genes within a genome that contain the same or similar sequences. You can do this quite simply. First, patch your sequence into the provided text window. Then select blastp if the provided sequence was a protein sequence or blastn if the provided sequence was DNA. Finally, select the organism you wish to search from the pull-down menu. Then click on Search for Matches. You should get blast output, with links set to get you to the desired gene/protein page.

You should now verify that "NDAERQATKDAGKIAGLEVERIINEPTAAALAYGLDKT" could be used to locate dnaK in Bacillus subtilis.

That ends our 10-minute discussion of how to find the gene/protein you are interested in. Clearly, there is much more that could be said about how to use the SEED search facilities, but this should cover the vast majority of your search needs.

Step 2: Finding similar genes that occur in clusters on prokaryotic genomes

Suppose that you have found a desired gene/protein page. We have not told you how to interpret it. Nor do we intend to. It is a page full of information, links, and possible services. Our strategy in this simple tutorial is to just show you how to find relevant clusters of genes, by which we mean clusters of functionally related genes that include either the gene you are "positioned on" or a corresponding gene in another organism.

First, position yourself on the gene/protein page for gi|21283241.

The table at the top of the page describes the genes in the region of the chromosome surrounding the gene you are positioned on (fig|196620.1.peg.1512, which is the SEED ID for the gene encoding gi|21283241). The entry for the gene you are positioned on is shown in green. Just below the table is a small graphical display of the region. The gene you are positioned on is shown in green. Genes that are believed to be "functionally related" (based on the fact that they occur close to each other in a number of genomes) is shown as blue. Others are red.

It so happens that the gene you are positioned on is in a cluster. The cluster contains 7 genes. Each of the genes in the cluster has a little Pins link to the side.

To find any larger clusters (occuring in other genomes) that contain genes similar to the one you are positioned on, you can click on the CL link just to the left of the gene. Which genomes contain larger clusters? Were you able to locate the corresponding gene in Bacillus subtilis subsp. subtilis str. 168 or in Bacillus cereus ATCC 14579? In each of those genomes the cluster is slightly larger.

Note that you can find these largest cluster, even when you are on a gene that is not in a cluster (or even one from a eukaryotic genome).

Step 3: Getting a display that shows the relevant clusters in a number of genomes

Once you are positioned on a gene in a cluster (which may or may not be one of the largest clusters), you should click on the Pins button just to the left of the shaded green area. Try it.

In a separate window, you should see a portrayal of different versions of the same (or closely related) clusters as they occur in other genomes. The red genes are aligned in the center of the page, and then all of the genes around this central "pin" are shown. Similar genes will have the same color. You should be able to mouse-over genes in the display and see the functions of the genes. Finally, if you choose to click on the Commentary button, another window will pop up containing information about each of the colored sets of genes.

An Exercise: Do Clusters Really Mean Anything?

Pick a pathway from central metabolism (i.e., a pathway that you know exists in several organisms). Then pick a gene from that pathway in an organism that you know has the gene. Now, find the gene/protein page corresponding to the gene.

Now, the question we pose is "Can you now find large clusters for the gene, and if you can do the large clusters contain other functional roles from the same pathway?"

If you perform this exercise ten times, you should get a pretty accurate feel for why we believe the study of gene clusters is of central importance.