Abstract

The emergence of Next Generation Sequencing generates an incredible amount of sequence and great potential for new enzyme discovery. Despite this huge amount of data and the profusion of bioinformatic methods for function prediction, a large part of known enzyme activities is still lacking an associated protein sequence. These particular activities are called “orphan enzymes”. The present review proposes an update of previous surveys on orphan enzymes by mining the current content of public databases. While the percentage of orphan enzyme activities has decreased from 38% to 22% in ten years, there are still more than 1,000 orphans among the 5,000 entries of the Enzyme Commission (EC) classification. Taking into account all the reactions present in metabolic databases, this proportion dramatically increases to reach nearly 50% of orphans and many of them are not associated to a known pathway. We extended our survey to “local orphan enzymes” that are activities which have no representative sequence in a given clade, but have at least one in organisms belonging to other clades. We observe an important bias in Archaea and find that in general more than 30% of the EC activities have incomplete sequence information in at least one superkingdom. To estimate if candidate proteins for local orphans could be retrieved by homology search, we applied a simple strategy based on the PRIAM software and noticed that candidates may be proposed for an important fraction of local orphan enzymes. Finally, by studying relation between protein domains and catalyzed activities, it appears that newly discovered enzymes are mostly associated with already known enzyme domains. Thus, the exploration of the promiscuity and the multifunctional aspect of known enzyme families may solve part of the orphan enzyme issue. We conclude this review with a presentation of recent initiatives in finding proteins for orphan enzymes and in extending the enzyme world by the discovery of new activities.ReviewersThis article was reviewed by Michael Galperin, Daniel Haft and Daniel Kahn.

Highlights

  • The emergence of Generation Sequencing generates an incredible amount of sequence and great potential for new enzyme discovery

  • The number of orphan enzymes considerably increased in each superkingdom with a high proportion of local orphans (52% for Eukaryota and Bacteria, and 91% for Archaea). These results should be taken with caution as FRENDA/ AMENDA data is not subjected to manual curation

  • We found that PRIAM is able to retrieve candidate proteins for a non-negligible fraction of local orphans previously defined using BRENDA data: 30% for Archaea and Bacteria, and 59% in Eukaryota

Read more

Summary

Conclusion

Despite an observed decrease of the number of orphan enzyme activities over the last ten years, the orphan enzyme challenge remains important: more than 30% of the enzymatic activities reported in the EC classification have no or incomplete sequence information. It includes updated analyses of existing data from public databanks that substantially enhance our knowledge about orphan enzymes. I could support publication of this paper only after the authors include (at least as Supplementary Materials) the lists of global and local orphans from Figure S2. Reviewer 1 (Second Round): Dr Michael Galperin Previous authors’ response: We added the lists of global and local orphans and proteins in Supplementary Materials 2 and 3. These lists could be very useful for future studies.

41. Claudel-Renard C: Enzyme-specific profiles for genome annotation
Findings
45. Jeffery CJ
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call