Assignment 1: Getting Started
The goal of this assignment is to acquaint you with exactly how to use
the SEED to extract basic information relating to a genome. We assume
that you know how to access a version of the SEED and that you have
access to someone that can help you get started.
You need to start by being able to locate specific genes. Suppose
that you have just been reading an article in the Journal of
Bacteriology and you wish to connect what you have been reading to
what you see in the SEED. There are at least two common cases:
- you have a gene name or a function from the article, or
- you have a short fragment of sequence from the article.
Problem 1.1: Locate the gene in Brucella melitensis suis that
was called BR0018 by the authors of the article describing the
Problem 1.2: Locate the gene in Brucella melitensis suis that
encodes a protein containing the following short amino acid sequence:
Once you have located a gene, you should be able to figure out
- what functions have been assigned to the gene by different people,
- what biochemical subsystem the gene belongs to,
- what genes from other organisms are similar to the given gene, and
- what annotations have been associated with any of those genes.
Problem 1.3: What functional roles have been assigned to the
gene BR0018 in Brucella melitensis suis?
Note that the FIG assignment differs from some of the others. Please
describe the differences (i.e., are the assignments incompatible?).
Problem 1.4: What biochemical subsystem includes the gene, assuming the
FIG assignment is correct? What other functional roles would you
expect to exist, assuming the function is accurate?
Problem 1.5: Give the three "closest genes" from other organisms
based on similarity.
To address problem 1.4, you need to somehow get to KEGG. You can do
this through the SEED, or you can go directly to that system. If KEGG
has integrated the organism you are interested in, you can see what
metabolic functions can be directly connected to genes. If the
organism is in the SEED, but not KEGG, you will need to go through the
see what functional roles can be connected to genes.
In any event, you need to locate the portion of the metabolic reaction
network that is relevant and try to understand what possible pathways
the gene might participate in.
Problem 1.5 has a slight wrinkle; you need to understand how the FIG
similarities are presented and what it means to "expand raw sims".
Alternatively, you could just use psi-blast to locate the related
Problem 1.6: Pick a gene in Staphylococcus aureus subsp. aureus
MW2 (say, "recN"). Then, use To Compare Regions to find at
least 3 genes (from any of the Staph aqureus genomes) that were
probably miscalled (either not called at all, or had incorrect start
positions). You need not do the analysis required to determine
exactly what the correct call should be (which is not something the
SEED supports very well); just identify where the potential problems
To do problem 1.6, you need to first position yourself on a gene from
the desired genome. This you should know how to do, so do it. Then,
look down the page. There is a section in which you have links to do
things like get the DNA sequence, the protein sequence and so forth.
One of those links should be To Compare Regions. That is the
one you want. Try it and look at the results. Notice that besides
seeing a colored display (in which genes that have the same color and
number are similar), you have a button Commentary that you can
click on to get a commentary of the functions of the genes in a