[Bio] / FigTutorial / tyra_example.html Repository:
ViewVC logotype

View of /FigTutorial/tyra_example.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (download) (as text) (annotate)
Fri Mar 31 12:39:46 2006 UTC (13 years, 7 months ago) by overbeek
Branch: MAIN
CVS Tags: rast_rel_2014_0912, rast_rel_2008_06_18, rast_rel_2008_06_16, rast_rel_2008_07_21, rast_rel_2010_0928, rast_2008_0924, rast_rel_2008_09_30, rast_rel_2010_0526, rast_rel_2014_0729, rast_rel_2009_05_18, rast_rel_2009_0925, rast_rel_2010_1206, rast_rel_2010_0118, rast_rel_2009_02_05, rast_rel_2011_0119, rast_rel_2008_12_18, rast_rel_2008_10_09, rast_release_2008_09_29, rast_rel_2008_04_23, rast_rel_2008_08_07, rast_rel_2009_07_09, rast_rel_2010_0827, myrast_33, rast_rel_2011_0928, rast_rel_2008_09_29, rast_rel_2008_10_29, rast_rel_2009_03_26, rast_rel_2008_11_24, HEAD
add TyrA example to tutorials

<h1>Ambiguity and the Choice of Functional Roles</h1>

I am writing this to try to capture the essence of a discussion that occurred between
Carol Bonner, Roy Jensen, Andrei Osterman, myself (Ross Overbeek), and Veronika Vonstein
as we try to start a subsystem to capture the work of Carol and Roy relating to TyrA homologs.
The discussion is worth capturing, since the problem being exposed re-occurs frequently, and
other annotators have faced these same issues with somewhat
inconsistent responses.  The central issues relating to how to capture
specificity, cofactors and uncertainty are coming up in many cases,
and we should seek a more-or-less consistent policy on how to handle
them.  We use the <b>TyrA</b> example just because it exposed these
issues so wonderfully.

<h2>The Overall Set of Reactions</h2>

As in many cases, we began by trying to describe the scope of the subsystem.  I argued
for embedding the discussion of TyrA within a more general discussion if 
<b>Phenylalanine, Tyrosine and p-aminophenylpyruvate synthesis</b>, since the TyrA
homologs are embedded within these three biosynthesis pathways.  So, let me start with
trying to layout the relevant reactions.  For now, I am going to ignore p-aminophenylpyruvate synthesis
and focus on just phenylalanine and tyrosine synthesis; we can add the third pathway, if we can
get these two done correctly.  I believe that the relevant reacions are as follows:

<br><br>
<table border>
<caption><b>Two Paths to Phenylalanine</b></caption>
<tr><th>To KEGG</th><th>Reaction</th><th>Catalyzed By</th></tr>
<tr><td><a href="http://www.genome.ad.jp/dbget-bin/www_bget?rn+R01373" target=reaction22138>R01373</a></td><td>Prephenate => Phenylpyruvate + H2O + CO2</td><td>Prephenate dehydratase (EC 4.2.1.51)<td></tr>
<tr><td><a href="http://www.genome.ad.jp/dbget-bin/www_bget?rn+R00688" target=reaction22138>R00688</a></td><td>Phenylpyruvate + NH3 + NADH => L-Phenylalanine + H2O + NAD+</td><td>Phenylalanine dehydrogenase (EC 1.4.1.20)</td></tr>
<tr><td><a href="http://www.genome.ad.jp/dbget-bin/www_bget?rn+R00694" target=reaction22138>R00694</a></td><td>Phenylpyruvate + L-Glutamate <=> L-Phenylalanine + 2-Oxoglutarate</td><td>L-Phenylalanine:2-oxoglutarate aminotransferase</td></tr>
<tr><td>***</td><td></td><td></td></tr>
<tr><td><a href="http://www.genome.ad.jp/dbget-bin/www_bget?rn+R01731" target=reaction22138>R01731</a></td><td>L-Aspartate + Prephenate <=> Oxaloacetate + L-Arogenate</td><td>Aromatic-amino-acid aminotransferase (EC 2.6.1.57)</td></tr>
<tr><td><a href="http://www.genome.ad.jp/dbget-bin/www_bget?rn+R03120" target=reaction22138>R03120</a></td><td>L-glutamate + Prephenate <=> 2-Oxoglutarate + L-Arogenate</td><td>Aromatic-amino-acid aminotransferase (EC 2.6.1.57)</td></tr>
<tr><td><a href="http://www.genome.ad.jp/dbget-bin/www_bget?rn+R00691" target=reaction22138>R00691</a></td><td>L-Arogenate => L-Phenylalanine + H2O + CO2</td><td>Arogenate dehydratase (EC 4.2.1.91)</td></tr>
</table>
<hr>
<br>
<table border>
<caption><b>Two Paths to Tyrosine</b></caption>
<tr><th>To KEGG</th><th>Reaction</th><th>Catalyzed By</th></tr>
<tr><td><a href="http://www.genome.ad.jp/dbget-bin/www_bget?rn+R01731" target=reaction22138>R01731</a></td><td>L-Aspartate + Prephenate <=> Oxaloacetate + L-Arogenate</td><td>Aromatic-amino-acid aminotransferase (EC 2.6.1.57)</td></tr>
<tr><td><a href="http://www.genome.ad.jp/dbget-bin/www_bget?rn+R00732" target=reaction22337>R00732</a></td><td>L-Arogenate + NAD+ => L-Tyrosine + CO2 + NADH</td><td>Arogenate dehydrogenase, NAD specific (EC 1.3.1.43)</td></tr>
<tr><td><a href="http://www.genome.ad.jp/dbget-bin/www_bget?rn+R00733" target=reaction22337>R00733</a></td><td>L-Arogenate + NADP+ => L-Tyrosine + CO2 + NADPH</td><td>Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)</td></tr>
<tr><td>***</td><td></td><td></td></tr>
<tr><td><a href="http://www.genome.ad.jp/dbget-bin/www_bget?rn+R01728" target=reaction22337>R01728</a></td><td>Prephenate + NAD+ => 4-hydroxyphenylpyruvate + CO2 + NADH + H+</td><td>Prephenate dehydrogenase, NAD specific (EC 1.3.1.13)</td></tr>
<tr><td><a href="http://www.genome.ad.jp/dbget-bin/www_bget?rn+R01730" target=reaction22337>R01730</a></td><td>Prephenate + NADP+ => 4-hydroxyphenylpyruvate + CO2 + NADPH + H+</td><td>Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)</td></tr>
<tr><td><a href="http://www.genome.ad.jp/dbget-bin/www_bget?rn+R00734" target=reaction22337>R00734</a></td><td>4-hydroxyphenylpyruvate + L-Glutamate <=> L-Tyrosine + 2-Oxoglutarate</td><td>Tyrosine aminotransferase (EC 2.6.1.5)</td></tr>
</table>
<p>

<h2>The TyrA Aspect of the Problem</h2>

The proteins that catalyze reactions R00732, R00733, R01728, and R01730 are extremely hard to disambiguate.  Further, many single proteins
catalyze several of these reactions.  This leads to two distinct approaches to choosing functional roles for the
subsystem.
<p>
<hr>
The following table illustrates the first approach:
<br><br>
<table border>
<tr><th>Gene</th><th>Functional Role</th></tr>
<tr><td><sub>NAD</sub>TyrA<sub>a</sub></td><td>Arogenate + NAD specific dehydrogenase (EC 1.3.1.12)</td></tr>
<tr><td><sub>NADP</sub>TyrA<sub>a</sub></td><td>Arogenate + NADP specific dehydrogenase (EC 1.3.1.12)</td></tr>
<tr><td><sub>NAD(P)</sub>TyrA<sub>a</sub></td><td>Arogenate specific + NAD(P) dehydrogenase (EC 1.3.1.12)</td></tr>
<tr><td><sub>x</sub>TyrA<sub>a</sub></td><td>Arogenate specific + NAD(P) unknown specificity dehydrogenase</td></tr>
<tr><td><sub>NAD</sub>TyrA<sub>c</sub></td><td>Cyclohexadienyl broad specificity + NAD dehydrogenase (EC 1.3.1.12)</td></tr>
<tr><td><sub>NADP</sub>TyrA<sub>c</sub></td><td>Cyclohexadienyl broad specificity + NADP dehydrogenase (EC 1.3.1.12)</td></tr>
<tr><td><sub>NAD(P)</sub>TyrA<sub>c</sub></td><td>Cyclohexadienyl broad specificity + NAD(P) dehydrogenase(EC 1.3.1.12)</td></tr>
<tr><td><sub>x</sub>TyrA<sub>c</sub></td><td>Cyclohexadienyl broad specificity + NAD(P) unknown specificity dehydrogenase(EC 1.3.1.12)</td></tr>
<tr><td><sub>NAD</sub>TyrA<sub>p</sub></td><td>Prephenate + NAD specific dehydrogenase (EC 1.3.1.12)</td></tr>
<tr><td><sub>NADP</sub>TyrA<sub>p</sub></td><td>Prephenate + NADP specific dehydrogenase (EC 1.3.1.12)</td></tr>
<tr><td><sub>NAD(P)</sub>TyrA<sub>p</sub></td><td>Prephenate specific + NAD(P)  dehydrogenase (EC 1.3.1.12)</td></tr>
<tr><td><sub>x</sub>TyrA<sub>p</sub></td><td>Prephenate specific + NAD(P) unknown specificity dehydrogenase (EC 1.3.1.12)</td></tr>
<tr><td><sub>NAD</sub>TyrA<sub>x</sub></td><td>Substrate specificity unknown + NAD dehydrogenase (EC 1.3.1.12)</td></tr>
<tr><td><sub>NADP</sub>TyrA<sub>x</sub></td><td>Substrate specificity unknown + NADP dehydrogenase (EC 1.3.1.12)</td></tr>
<tr><td><sub>NAD(P)</sub>TyrA<sub>x</sub></td><td>Substrate specificity unknown + NAD(P) dehydrogenase (EC 1.3.1.12)</td></tr>
<tr><td><sub>x</sub>TyrA<sub>x</sub></td><td>Substrate specificity unknown + NAD(P) specificity unknown dehydrogenase (EC 1.3.1.12)</td></tr>
</table>

<br><br>

Using this approach, every gene has a function that connects to a single functional role that
conveys the level of ambiguity in specificity and current knowledge.
<br>
<hr>
<br>
The second approach utilizes the "/", "@", and ";" connectives when specifying the gene function
and includes only four functional roles:
<br><br>
<table border>
<tr><th>Functional Role</th></tr>
<td>Arogenate dehydrogenase, NAD specific (EC 1.3.1.43)</td></tr>
<td>Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)</td></tr>
<td>Prephenate dehydrogenase, NAD specific (EC 1.3.1.13)</td></tr>
<td>Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)</td></tr>

</table>
<br><br>

Under this approach one would makes assignments of the form as shown in the following table:

<br><br>
<table border>
<tr><th>Gene</th><th>Gene Function</th></tr>
<tr><td><sub>NAD</sub>TyrA<sub>a</sub></td><td>Arogenate dehydrogenase, NAD specific (EC 1.3.1.43)</td></tr>
<tr><td><sub>NADP</sub>TyrA<sub>a</sub></td><td>Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)</td></tr>
<tr><td><sub>NAD(P)</sub>TyrA<sub>a</sub></td><td>Arogenate dehydrogenase, NAD specific (EC 1.3.1.43) @ Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)</td></tr>
<tr><td><sub>x</sub>TyrA<sub>a</sub></td><td>Arogenate dehydrogenase, NAD specific (EC 1.3.1.43) ; Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)</td></tr>
<tr><td><sub>NAD</sub>TyrA<sub>c</sub></td><td>Arogenate dehydrogenase, NAD specific (EC 1.3.1.43) ; Prephenate dehydrogenase, NAD specific (EC 1.3.1.13)</td></tr>
<tr><td><sub>NADP</sub>TyrA<sub>c</sub></td><td>Arogenate dehydrogenase, NADP specific (EC 1.3.1.43) ; Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)</td></tr>
<tr><td><sub>NAD(P)</sub>TyrA<sub>c</sub></td><td>Arogenate dehydrogenase, NAD specific (EC 1.3.1.43) @ Prephenate dehydrogenase, NAD specific (EC 1.3.1.13) @ Arogenate dehydrogenase, NADP specific (EC 1.3.1.43) @ Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)</td></tr>
<tr><td><sub>x</sub>TyrA<sub>c</sub></td><td>cannot be expressed</td></tr>
<tr><td><sub>NAD</sub>TyrA<sub>p</sub></td><td>Prephenate dehydrogenase, NAD specific (EC 1.3.1.13)</td></tr>
<tr><td><sub>NADP</sub>TyrA<sub>p</sub></td><td>Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)</td></tr>
<tr><td><sub>NAD(P)</sub>TyrA<sub>p</sub></td><td>Prephenate dehydrogenase, NAD specific (EC 1.3.1.13) @ Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)</td></tr>
<tr><td><sub>x</sub>TyrA<sub>p</sub></td><td>Prephenate dehydrogenase, NAD specific (EC 1.3.1.13) ; Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)</td></tr>
<tr><td><sub>NAD</sub>TyrA<sub>x</sub></td><td>Arogenate dehydrogenase, NAD specific (EC 1.3.1.43); Prephenate dehydrogenase, NAD specific (EC 1.3.1.13)</td></tr>
<tr><td><sub>NADP</sub>TyrA<sub>x</sub></td><td>Arogenate dehydrogenase, NADP specific (EC 1.3.1.43); Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)</td></tr>
<tr><td><sub>NAD(P)</sub>TyrA<sub>x</sub></td><td>cannot be expressed</td></tr>
<tr><td><sub>x</sub>TyrA<sub>x</sub></td><td>Arogenate dehydrogenase, NAD specific (EC 1.3.1.43) ; Prephenate dehydrogenase, NAD specific (EC 1.3.1.13) ; Arogenate dehydrogenase, NADP specific (EC 1.3.1.43) ; Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)</td></tr>
</table>

<br><br>
<hr>
I find the above table a little unsettling.  Since this is the approach that I advocated, I find it doubly unsettling.
When one adds the fact that we have a similar situation with the aminotransferases, the difficulties involved in
specifying the roles of the whole subsystem (in a way that would be transparent and make obvious sense to most biologists)
become pretty forbidding.
<p>
I also feel that I need to add a few comments made by Roy in order to set the stage properly for this discussion:
<br>
<ul>
<li>First, Roy feels that we should focus on the TyrA issues and settle them, since getting on to the "Stage 3" analysis
was one of our original goals in starting this discussion.
<li>Roy also notes that the EC numbers are not particularly helpful.  Even a quick perusal of the KEGG maps versus the
simple table of reactions I gave earlier makes it clear that at least the use of ECs (in this case) probably obscures
more than it reveals.  
<li>If we try to do the entire subsystem that I mentioned at the start, then 80-90% of the difficulty will relate
to disambiguating aminotransferases.
</ul>
<p>
<br>
<h2>The Hybrid Alternative</h2>

One proposal that has arisen, based on things a number of annnotators have been doing to avoid some
of the difficulties, would go as follows:
<br>
<ul>
<li>
In the subsystem, use the four functional roles given as the second alternative, but add one more: 
<b>Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity</b>. 
<li>
For each gene that has not been disambiguated, use this new role as the function.
<li>
For each gene that is completely disambiguated, use the second alternative expressions (based on ";" and "@").
<li>
For a gene that has been partially characterized (e.g., it is known that it plays at least the role 
<b>Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)</b> and maybe more) use the known functions followed by
<b>; Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity</b>.  That is, the gene would
match the precise functional roles, as well as the "catch-all" role we added to the spreadsheet.
</ul>
<p>
Thus, we would use the following functional roles:
<br><br>
<table border>
<tr><th>Functional Role</th></tr>
<tr><td>Arogenate dehydrogenase, NAD specific (EC 1.3.1.43)</td></tr>
<tr><td>Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)</td></tr>
<tr><td>Prephenate dehydrogenase, NAD specific (EC 1.3.1.13)</td></tr>
<tr><td>Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)</td></tr>
<tr><td>Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity</td></tr>
</table>
<br><br>


This leads to the following ways to express gene function:
<br>
<br><br>
<table border>
<tr><th>Gene</th><th>Gene Function</th></tr>
<tr><td><sub>NAD</sub>TyrA<sub>a</sub></td><td>Arogenate dehydrogenase, NAD specific (EC 1.3.1.43)</td></tr>
<tr><td><sub>NADP</sub>TyrA<sub>a</sub></td><td>Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)</td></tr>
<tr><td><sub>NAD(P)</sub>TyrA<sub>a</sub></td><td>Arogenate dehydrogenase, NAD specific (EC 1.3.1.43) @ Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)</td></tr>
<tr><td><sub>x</sub>TyrA<sub>a</sub></td><td>Arogenate dehydrogenase, NAD specific (EC 1.3.1.43) ; Arogenate dehydrogenase, NADP specific (EC 1.3.1.43); Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity</td></tr>
<tr><td><sub>NAD</sub>TyrA<sub>c</sub></td><td>Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity</td></tr>
<tr><td><sub>NADP</sub>TyrA<sub>c</sub></td><td>Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity</td></tr>
<tr><td><sub>NAD(P)</sub>TyrA<sub>p</sub></td><td>Prephenate dehydrogenase, NAD specific (EC 1.3.1.13) @ Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)</td></tr>
<tr><td><sub>x</sub>TyrA<sub>c</sub></td><td>Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity</td></tr>
<tr><td><sub>NAD</sub>TyrA<sub>p</sub></td><td>Prephenate dehydrogenase, NAD specific (EC 1.3.1.13)</td></tr>
<tr><td><sub>NADP</sub>TyrA<sub>p</sub></td><td>Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)</td></tr>
<tr><td><sub>NAD(P)</sub>TyrA<sub>p</sub></td><td>Prephenate dehydrogenase, NAD specific (EC 1.3.1.13) @ Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)</td></tr>
<tr><td><sub>x</sub>TyrA<sub>p</sub></td><td>Prephenate dehydrogenase, NAD specific (EC 1.3.1.13) ; Prephenate dehydrogenase, NADP specific (EC 1.3.1.13); Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity</td></tr>
<tr><td><sub>NAD</sub>TyrA<sub>x</sub></td><td>Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity</td></tr>
<tr><td><sub>NADP</sub>TyrA<sub>x</sub></td><td>Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity</td></tr>
<tr><td><sub>NAD(P)</sub>TyrA<sub>x</sub></td><td>Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity</td></tr>
<tr><td><sub>x</sub>TyrA<sub>x</sub></td><td>Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity</td></tr>
</table>

<br><br>
These are not completely satisfactory, but I believe they are pretty close to what we want.
In a few cases, information is lost.  For these very few, I believe that the curator can maintain external
records until things clarify.
<p>
Anyone reading the spreadsheet can get an accurate grasp of what is known with high probability and what
issues remain murky.
<p>
<h2>The Issue of How Much Detail to Include in Functional Roles</h2>

The previous discussion focused on how to represent uncertainty in a functional role.  It is, perhaps, 
worth noting that Roy and Carol decided to go ahead with their original choices of functional roles.
Any of the above solutions would lead to conflicts with existing subsystems containing the 
role <b>Prephenate dehydrogenase (EC 1.3.1.13)</b>.  In effect, when the <b>TyrA subsystem</b> is 
added to our collection, it will force a resolution with the existing subsystems.
Given that the choice to go with the sixteen separate functional roles that each contain both specificity
and uncertainty information, this will mean replacing a single functional role in a number of subsystems
with sixteen distinct columns.
<p>
A similar issue will arise as we consider the <i>chorismate mutase</i>, which occurs in a closely
related piece of metabolism.  In this case, there are three distinct,
nonhomologous forms of the enzyme.  The question here is <b>Should we
have a single chorismate mutase column (representing comments relating
to form as either notes, annotations, or attributes), or should we
have three distinct columns?"</b>  Veronika puts all three forms in a
single column, while a number of us have adopted the convention of
placing
nonhomologous alternatives in separate columns.
 
<p>
Veronika feels (obviously correctly) that as the spreadsheet becomes
huge, we lose an ability to maintain an overview.  The detail swamps
the representation of the essential.
<p>
I am exploring the issue of how well we can have both detail and
overview by proper use of subsets.  I will report on my experiments at
the April meeting, undoubtedly using aromatic metabolism as a setting
for exploring the issues.
<p>
Before leaving this topic, let us consider the position that 
including the cofactors in the functional role obscures the situation
rather than clarifying it.  We have chosen (in most cases) to leave
properties like <i>thermostable</i> out of the functional role.  Why
include cofactors?  Which cofactors are needed is important, but we
consciously have chosen not to keep all important aspects of the
function within the actual functional role.  Might it not be better to
save the cofactor elsewhere?  I am reluctant to do so, since I would
like to capture the actual reactions.  On the other hand, we can
attach multiple reactions to a single functional role.
This view would lead to the following version:
<br><br><br>
<table border>
<tr><th>Functional Role</th></tr>
<tr><td>Arogenate dehydrogenase (EC 1.3.1.43)</td></tr>
<tr><td>Prephenate dehydrogenase(EC 1.3.1.13)</td></tr>
</table>
<br><br>
Uncertainty or ambiguity would be represented using the ";" and "/"
operators.
There is a certain appeal to this brevity.
<p>
Veronika believes that there should be three functional roles: the two
above and <b>Cyclohexadienyl dehydrogenase</b> which would cover both
broad specificity or uncertainty.
<hr>
This is where the discussion stands at the moment.  I am actually feeling extremely satisfied that these issues
are being cast so vividly in a form we must address. 

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3