Abstract
Determining the molecular function of enzymes discovered by genome sequencing represents a primary foundation for understanding many aspects of biology. Historically, classification of enzyme reactions has used the enzyme nomenclature system developed to describe the overall reactions performed by biochemically characterized enzymes, irrespective of their associated sequences. In contrast, functional classification and assignment for the millions of protein sequences of unknown function now available is largely done in two computational steps, first by similarity-based assignment of newly obtained sequences to homologous groups, followed by transferring to them the known functions of similar biochemically characterized homologs. Due to the fundamental differences in their etiologies and practice, `how’ these chemistry- and evolution-centric functional classification systems relate to each other has been difficult to explore on a large scale. To investigate this issue in a new way, we integrated two published ontologies that had previously described each of these classification systems independently. The resulting infrastructure was then used to compare the functional assignments obtained from each classification system for the well-studied and functionally diverse enolase superfamily. Mapping these function assignments to protein structure and reaction similarity networks shows a profound and complex disconnect between the homology- and chemistry-based classification systems. This conclusion mirrors previous observations suggesting that except for closely related sequences, facile annotation transfer from small numbers of characterized enzymes to the huge number uncharacterized homologs to which they are related is problematic. Our extension of these comparisons to large enzyme superfamilies in a computationally intelligent manner provides a foundation for new directions in protein function prediction for the huge proportion of sequences of unknown function represented in major databases. Interactive sequence, reaction, substrate and product similarity networks computed for this work for the enolase and two other superfamilies are freely available for download from the Structure Function Linkage Database Archive (http://sfld.rbvi.ucsf.edu).
Highlights
Examinations of enzyme chemistry and enzyme evolution have been done using separate and distinct approaches to organize, compare and disseminate each type of data
The most foundational source for naming experimentally determined enzyme reactions is provided by the Enzyme Commission (EC), which defines catalytic reactions using a hierarchical set of four-digit numbers that run from least to most specific descriptors [4, 5]. (Importantly, for this work, the third digit of the EC system designates an overall enzyme reaction, while the fourth digit designates substrate specificity.) In addition to the enzyme nomenclature data, many other resources provide online access to more in-depth types of information about enzyme chemistry, including overall chemical transformations and functional features, such as kinetic details and mechanisms of reactions
From a protein-centric viewpoint, features of proteins can be represented in an evolutionary context in which sequences and structures can be compared among homologous members of a superfamily to identify conserved features likely to be associated with their specific molecular functions
Summary
Examinations of enzyme chemistry and enzyme evolution have been done using separate and distinct approaches to organize, compare and disseminate each type of data. (Importantly, for this work, the third digit of the EC system designates an overall enzyme reaction, while the fourth digit designates substrate specificity.) In addition to the enzyme nomenclature data, many other resources provide online access to more in-depth types of information about enzyme chemistry, including overall chemical transformations and functional features, such as kinetic details and mechanisms of reactions. These may include some sequence features, e.g. active site residues with descriptions of their functions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.