Assignment 1: Getting Started

The goal of this assignment is to acquaint you with exactly how to use the SEED to extract basic information relating to a genome. We assume that you know how to access a version of the SEED and that you have access to someone that can help you get started.

You need to start by being able to locate specific genes. Suppose that you have just been reading an article in the Journal of Bacteriology and you wish to connect what you have been reading to what you see in the SEED. There are at least two common cases:

  1. you have a gene name or a function from the article, or
  2. you have a short fragment of sequence from the article.

Problem 1.1: Locate the gene in Brucella melitensis suis that was called BR0018 by the authors of the article describing the genome.
Problem 1.2: Locate the gene in Brucella melitensis suis that encodes a protein containing the following short amino acid sequence: "PTEPVWQGEIQGAGLGMAVDVWNDD".

Once you have located a gene, you should be able to figure out
Problem 1.3: What functional roles have been assigned to the gene BR0018 in Brucella melitensis suis?
Note that the FIG assignment differs from some of the others. Please describe the differences (i.e., are the assignments incompatible?).
Problem 1.4: What biochemical subsystem includes the gene, assuming the FIG assignment is correct? What other functional roles would you expect to exist, assuming the function is accurate?
Problem 1.5: Give the three "closest genes" from other organisms based on similarity.
To address problem 1.4, you need to somehow get to KEGG. You can do this through the SEED, or you can go directly to that system. If KEGG has integrated the organism you are interested in, you can see what metabolic functions can be directly connected to genes. If the organism is in the SEED, but not KEGG, you will need to go through the SEED to see what functional roles can be connected to genes. In any event, you need to locate the portion of the metabolic reaction network that is relevant and try to understand what possible pathways the gene might participate in.
Problem 1.5 has a slight wrinkle; you need to understand how the FIG similarities are presented and what it means to "expand raw sims". Alternatively, you could just use psi-blast to locate the related genes/organisms.
Problem 1.6: Pick a gene in Staphylococcus aureus subsp. aureus MW2 (say, "recN"). Then, use To Compare Regions to find at least 3 genes (from any of the Staph aqureus genomes) that were probably miscalled (either not called at all, or had incorrect start positions). You need not do the analysis required to determine exactly what the correct call should be (which is not something the SEED supports very well); just identify where the potential problems exist.
To do problem 1.6, you need to first position yourself on a gene from the desired genome. This you should know how to do, so do it. Then, look down the page. There is a section in which you have links to do things like get the DNA sequence, the protein sequence and so forth. One of those links should be To Compare Regions. That is the one you want. Try it and look at the results. Notice that besides seeing a colored display (in which genes that have the same color and number are similar), you have a button Commentary that you can click on to get a commentary of the functions of the genes in a different window.