[Bio] / FigTutorial / bioinf_first_class_part2.html Repository:
ViewVC logotype

View of /FigTutorial/bioinf_first_class_part2.html

Parent Directory Parent Directory | Revision Log Revision Log

Revision 1.2 - (download) (as text) (annotate)
Thu Aug 11 14:57:37 2005 UTC (14 years, 3 months ago) by overbeek
Branch: MAIN
CVS Tags: rast_rel_2014_0912, rast_rel_2008_06_18, rast_rel_2008_06_16, rast_rel_2008_07_21, rast_rel_2010_0928, rast_2008_0924, rast_rel_2008_09_30, caBIG-13Feb06-00, rast_rel_2010_0526, rast_rel_2014_0729, rast_rel_2009_05_18, caBIG-05Apr06-00, rast_rel_2009_0925, rast_rel_2010_1206, rast_rel_2010_0118, caBIG-00-00-00, rast_rel_2009_02_05, rast_rel_2011_0119, rast_rel_2008_12_18, rast_rel_2008_10_09, rast_release_2008_09_29, rast_rel_2008_04_23, rast_rel_2008_08_07, rast_rel_2009_07_09, rast_rel_2010_0827, myrast_33, rast_rel_2011_0928, rast_rel_2008_09_29, rast_rel_2008_10_29, rast_rel_2009_03_26, rast_rel_2008_11_24, HEAD
Changes since 1.1: +79 -16 lines
additions to class writeups

<h1>The Initial Attempt to Produce a Metabolic Reconstruction</h1>

A metabolic reconstruction refers to an attempt to infer the metabolic
machinery of an organism from the sequenced genome and available
literature.  The term was introduced by Evgeni Selkov in his early
work on the first sequenced genomes.  Selkov made available his
substantial collection of encoded metabolic pathways, and those along with
existing encodings (most notably the wonderful pathway charts created
by Gerhard Michal and distributed by Boehringer Mannheim) launched
numerous efforts to encode the metabolism of sequenced organisms.
The major effort by <a href=http://www.genome.jp/kegg/>KEGG</a> has
become, perhaps, the most well known, and is what the SEED effort has
tended to utilize.

Different groups have created slightly differing notions of what is
meant by <i>metabolic reconstruction</i>.  Within the context of this
course, we draw the following distinctions:
<li>By an <b>informal metabolic reconstruction</b> we refer to
taking the genes of an organism and dividing them into small groups
that each perform some well-defined cellular function,
identifying the overall function of each of these groups, and
attaching to each gene a list of the abstract functions implemented by
each gene.
With informal metabolic reconstructions, it is common to include not
only metabolic subsystems (i.e., pathways), but nonmetabolic
subsystems, as well.
By a <b>formal metabolic reconstruction</b> we refer to a detailed
encoding of the metabolic reaction network of the organism.
That is, the informal reconstruction attempts to represent as much of
the cellular machinery as possible.  It provides a solid foundation
from which the formal metabolic reconstruction can be based.  However,
the informal metabolic reaction has substantial by itself.  There are
many aspects of the phentotype that can be inferred by just
qualitative reasoning based upon the presence or absence of specific
subsystems or functional roles.  Further, many aspects of the
biochemistry (e.g., "missing genes") can be analyzed from just the
perspective of the informal metabolic reconstruction.
The formal is usually
limited to just metabolic reactions (and those reactions involving
generation or degradation of polymers are normally left out).  The
output of a formal metabolic reconstruction will include detailed
encodings of both the reactions and the compounds that appear in the
metabolic network.  
These distinctions are ours, and are not commonly used.
We consider them unimportant, but useful.

In this section of the course, we are asking the student to build both
an informal and a formal metabolic reconstruction for some sequenced
organism.  Clearly this is an ambitious task.  It would have been
largely impossible to do anything significant 10 years ago, but with
the new tools we believe that this effort can be quite productive as
an amazing crash course in biochemistry and microbial physiology.

Rather than break this part of the course up into weekly assignments
(at least for now), we list the detailed steps we would like the
student to work through.

We are going to suggest that each student be assigned a distinct
organism (alternatively, groups of students can work jointly on a
single organism).  We sugesst choosing an organism that fulfills the
following criteria:

<li>It should be a small to moderately large sequenced, prokaryotic genome
(450-2500 genes).

<li>It should be a genome for which metabolic reconstructions have not
already been done or are known to be in progress.

<li>It should be in the public domain,

<li>The genome should be included in both the KEGG collection and in
the SEED collection.

<h2>Steps in the Process of Developing an Informal Metabolic Reconstruction</h2>

<h3>Getting summaries of what is in the genome</h3>

First, you should get two estimates of what cellular machinery is
present in the organism:
<li>You should get a list of the subsystems with operational variants
from a SEED installation.  
The easiest way to do this involves starting from the first page
of the SEED, asking for <b>Statistics</b> for the genome you are
working on, and then (near the bottom of the page) clicking on <b>Show

Note that the subsystems and genes that you
get back may include both well-curated subsystems and
poorly-constructed subsystems.
<li>You should get colored versions of the KEGG maps (showing which
functions are believed to be present in the genome).

<h3>Begin with the Common Machinery</h3>

There is a subset of the cellular machinery that will be present in
some form in whichever genome you picked.  The ribosomal RNA,
ribosomal proteins, tRNAs, tRNA synthetases, and so forth must all be
there.  Look through the set of subsystems that are present, decide
what aspects appear to be essential machinery relating to
transcription and translation, and begin with that.  Create a detailed
summary of which topics you have selected, which variants exist, and
which genes implement those variants.  Which rRNAs and tRNAs exist?
How many copies of the rRNA cluster exist?

<h3>Studying Amino Acid Synthesis</h3>

Next, we suggest <i>amino acid metabolism</i>, or even more restricted
<i>the synthesis of amino acids</i>.  Identify which of the KEGG maps
address this section of metabolism, and then which subsystems from the
SEED are relevant.  Now prepare a list of the amino acids that can be
synthesized, along with the starting point in each case.  Make sure
that you compose a detailed list of outstanding questions.

<h3>Synthesis of Nucleotides</h3>

We suggest that you next turn your attention to synthesis of nucleotides.
Locate the appropriate KEGG charts and the relevant
subsystems.  Again, summarize the situation, along with outstanding

<h3>Systematically Work Through the Central Cellular Machinery</h3>

Between the SEED hierarchy, the KEGG maps, and the numerous examples
of metabolic reconstructions published in genome papers, you have
numerous examples of the basic components of a functional hierarchy.
You should choose a reasonable organizational style and produce
an HTML document comprising your best effort at an informal metabolic

<h2>The Basic Steps in Building a Formal Metabolic Reconstruction</h2>

You should begin by studying exactly how Bernhard Palsson and his team
have built formal metabolic reconstructions:
href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12952533&query_hl=1">Escherichia coli</a>,
aureus</i></a> and 

You are being asked to construct a list of several hundred reactions,
where each reaction includes precise substrates, products and
(possibly) a required enzyme.

<h3>Begin from the Informal Metabolic Reconstruction</h2>

You should begin from the informal metabolic reconstruction and
accumulate the reactions and compounds implied by the operational
variants of the subsystems.This can be done by starting from the first page
of the SEED, asking for <b>Statistics</b> for the genome you are
working on, and then (near the bottom of the page) clicking on <b>Show
This tool produces an initial estimate of the reaction set.  
This initial set is far from complete and some of the reactions
presented will be encoded improperly.  Before continuing let us just
ponder what a "complete and accurated formal metabolic reconstruction"
would contain:
<li>It would contain all of the metabolic reactions.  For many
purposes it might be useful to exclude synthesis and degradation of
polymers.  On the other hand, for whole organism modeling, it becomes
necessary to estimate the compounds that can be transported into and
out of the cell.  This is quite difficult.  For the purposes of this
class you should ignore the reactions relating to polymers, and you
should ignore the issue of exactly what can be transported.

<li>The reaction set should not contain <i>class reactions</i> (those
in which the substrates and products are not specific).

<li>You can construct a list of compounds that must be
present based on this incomplete reaction set.  To get the initial
approximation of this set, ...

<li>From the list of reactions and compounds, you should be able to
produce a set of compounds that exist as substrates for one or more
reactions, but not as products.  This computation cannot be done
accurately without knowing which reactions are reversible.  This can
be done using ...

<li>Finally, you should produce a list of compounds that exist as
products, but not as substrates.  This can be done using ...

For the purposes of this class, construction of this initial, crude
formal metabolic reconstruction is both the best you can do and a
major achievement.  To refine it into a useful and accurate summary of
the metabolism of the cell is something that a person might work a
lifetime on.


The object of this portion of the class will be to development both an
initial <b>informal metabolic reconstruction</b> and a <b>formal
metabolic reconstruction</b> for some specific organism.  If you can
successfully achieve this, you will have done something that was
almost impossible even a few years ago.  If you study and reflect on
what you accomplish, it will form a starting point for deepening your
understanding of microbial physiology and biochemistry.

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3