Splitting Functional Roles
As subsystems are refined, there is a growing need to add more precision
to the functional roles. The most obvious case relates to introducution
of eukaryotic genomes. In these cases, many paralogs exist for some
functional roles, and these paralogs represent equivalent functionality
in different compartments, in different tissues, or at different stages
of development. In these cases, it is normally considered essential that
the gene function convey location, development stage, and/or tissue when
these details are known.
A similar case occurs in prokaryotes when the same enzymatic activity occurs
within different pathways, and some organisms employ distinct versions of
the enzyme. Often the main use of each copy can be distinguished, and
it is considered important that the distinction be captured.
There have been extended discussions about whether or not such subdivision of
is even desirable. The issue of whether detail should be expressed
within the functional role, as opposed to "attributes" or
"annotations" remains a source of disagreement.
In my view, it makes sense to subdivide functional roles, as long as
one retains the ability to work with a spreadsheet in which they can
be collapsed back into a single column. This is the exact
functionality provided by column subsets that begin with an asterisk.
That is, if one defines
it states that columns 5, 6 and 7 (which presumably encode alternative
versions of an enzyme you wish abbreviated as PFK) can normally be
collapsed into a single column. By checking the option ignore
alternatives, the fully expanded spreadsheet becomes available.
I believe that this rudimentary ability to introduce more precision
into functional roles, while supporting the ability to collapse the
extended set back to a single column, is adequate for handling many of
the problematic cases we are now encountering.
Exactly How to Do It Cleanly
The key problem in introducing subdivisions relates to the fact that
they may impact numerous subsystems. Hence, I am proposing the
First, determine who owns the subsystems that would be impacted by
splitting a functional role, and establish agreement on the split.
For things to proceed, it is required that all effected curators agree
that the split can take place and exactly what set of functional roles
will be employed to represent distictions. It is essential that there
be complete agreement before anyone proceeds with a split. Normally,
a subdivision would include the original functional role, which would
be used to represent cases in which the added precision could not be
determined. For example, suppose that we chose to split the
functional role Threonine dehydratase (EC 220.127.116.11) into
Threonine dehydratase (EC 18.104.22.168)
Threonine dehydratase biosynthetic (EC 22.214.171.124)
Threonine dehydratase catabolic (EC 126.96.36.199)
following the Swiss Prot classification. We would normally still leave in the
imprecise role to cover cases when the extra precision was not needed
Once a split is agreed upon, a single curator should run the utility
split_functional_role.cgi. This utility will accept the
original functional role, as well as the set it should be split up
into. It will take each subsystem on the machine upon which it is run
that has a functional role matching the one to be split, and it will
replace the split role with the set (which should, in most cases,
include the split role). In addition, it will add a subset to
represent the entire set.
Then, curators need to alter gene functions to reflect the added
precision (if they wish) and refill the spreadsheets.
In some cases, curators will only wish to retain a subset of the more
precise functional roles within their subsystems, and hence they may
wish to delete one or more of the inserted functional roles.
Finally, the modified subsystems should be published.