Abstract

Despite advances in sequencing technology, there are still significant numbers of well-characterized enzymatic activities for which there are no known associated sequences. These ‘orphan enzymes’ represent glaring holes in our biological understanding, and it is a top priority to reunite them with their coding sequences. Here we report a methodology for resolving orphan enzymes through a combination of database search and literature review. Using this method we were able to reconnect over 270 orphan enzymes with their corresponding sequence. This success points toward how we can systematically eliminate the remaining orphan enzymes and prevent the introduction of future orphan enzymes.

Highlights

  • Nucleotide or amino-acid sequence data is the lingua franca that connects disparate branches of modern biology

  • Documents were collected, including texts cited in BRENDA and Sequence identification by literature search for enzyme names We identified sequence data for putative orphan enzymes via a combination of literature and databases searches, partial sequence data, and the use of other identification information in combination with a sequenced genome

  • It is likely that the characteristics of future putative orphan enzymes that have not yet been assigned EC numbers will match those we evaluated in the current work

Read more

Summary

Introduction

Nucleotide or amino-acid sequence data is the lingua franca that connects disparate branches of modern biology. Confronted with a novel amino acid sequence with no known function, researchers search sequence databases such as the NCBI non-redundant protein sequences database for significant hits [1] From these results they receive clues to protein function in the form of predicted binding sites, catalytic sites, structural motifs, protein family membership, and identification of highly similar characterized proteins. Researchers infer a wealth of knowledge about a protein’s function based on the sequence data before a single lab experiment is performed Orphan enzymes are those enzymes that have been experimentally characterized but lack associated amino acid sequences. The remaining 20% of orphan enzymes are predicted to have sequence data available, buried in papers and patents, or incorrectly annotated in sequence databases This literature and database approach is obviously the most cost-effective way to find sequence data for orphan enzymes, as it involves no new experiments or experimental validation of computationally predicted candidate sequences. We have outlined a process that can be applied to evaluating the remaining orphan enzyme activities

Results
Discussion
Findings
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call