Splitting Functional Roles

As subsystems are refined, there is a growing need to add more precision to the functional roles. The most obvious case relates to introducution of eukaryotic genomes. In these cases, many paralogs exist for some functional roles, and these paralogs represent equivalent functionality in different compartments, in different tissues, or at different stages of development. In these cases, it is normally considered essential that the gene function convey location, development stage, and/or tissue when these details are known.

A similar case occurs in prokaryotes when the same enzymatic activity occurs within different pathways, and some organisms employ distinct versions of the enzyme. Often the main use of each copy can be distinguished, and it is considered important that the distinction be captured.

There have been extended discussions about whether or not such subdivision of is even desirable. The issue of whether detail should be expressed within the functional role, as opposed to "attributes" or "annotations" remains a source of disagreement.

In my view, it makes sense to subdivide functional roles, as long as one retains the ability to work with a spreadsheet in which they can be collapsed back into a single column. This is the exact functionality provided by column subsets that begin with an asterisk. That is, if one defines

	*PFK     5,6,7
it states that columns 5, 6 and 7 (which presumably encode alternative versions of an enzyme you wish abbreviated as PFK) can normally be collapsed into a single column. By checking the option ignore alternatives, the fully expanded spreadsheet becomes available.

I believe that this rudimentary ability to introduce more precision into functional roles, while supporting the ability to collapse the extended set back to a single column, is adequate for handling many of the problematic cases we are now encountering.

Exactly How to Do It Cleanly

The key problem in introducing subdivisions relates to the fact that they may impact numerous subsystems. Hence, I am proposing the following protocol:
  1. First, determine who owns the subsystems that would be impacted by splitting a functional role, and establish agreement on the split. For things to proceed, it is required that all effected curators agree that the split can take place and exactly what set of functional roles will be employed to represent distictions. It is essential that there be complete agreement before anyone proceeds with a split. Normally, a subdivision would include the original functional role, which would be used to represent cases in which the added precision could not be determined. For example, suppose that we chose to split the functional role Threonine dehydratase (EC into
    	Threonine dehydratase (EC
    	Threonine dehydratase biosynthetic (EC
    	Threonine dehydratase catabolic (EC

    following the Swiss Prot classification. We would normally still leave in the imprecise role to cover cases when the extra precision was not needed or unknown.
  2. Once a split is agreed upon, a single curator should run the utility split_functional_role.cgi. This utility will accept the original functional role, as well as the set it should be split up into. It will take each subsystem on the machine upon which it is run that has a functional role matching the one to be split, and it will replace the split role with the set (which should, in most cases, include the split role). In addition, it will add a subset to represent the entire set.
  3. Then, curators need to alter gene functions to reflect the added precision (if they wish) and refill the spreadsheets.
  4. In some cases, curators will only wish to retain a subset of the more precise functional roles within their subsystems, and hence they may wish to delete one or more of the inserted functional roles.
  5. Finally, the modified subsystems should be published.