The CanOE Strategy: Integrating Genomic and Metabolic Contexts across Multiple Prokaryote Genomes to Find Candidate Genes for Orphan Enzymes

Adam Alexander Thil Smith,David Vallenet,Alain Viari,Eugeni Belda,Claudine Medigue,Christos A Ouzounis

doi:10.1371/journal.pcbi.1002540

Abstract

Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes), a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short). The first step locates “genomic metabolons”, i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12.

Highlights

27% of all enzymatic activities recognized by the IUBMB [www.iubmb.org] are still sequence-orphan metabolic activities in the UniProt databank [1], a number that has decreased slowly over the past years [2,3,4]
Due to the lack of any sequence data, tools based on sequence similarity detection cannot be used to solve the ‘‘orphan enzyme’’ problem, and research has turned to context-based approaches
Benchmarking In our benchmarking experiment, we considered the set of all metabolic reactions having at least one Known gene-reaction association involved in a metabolon

Summary

Introduction

27% of all enzymatic activities recognized by the IUBMB [www.iubmb.org] are still sequence-orphan metabolic activities (dubbed ‘‘orphan enzymes’’ for short) in the UniProt databank [1], a number that has decreased slowly over the past years [2,3,4]. It would, be too time-consuming and costly to conduct wet-lab experiments to test all known activities against all genes from the exponentially increasing number of sequenced genomes. Such strategies have since been put into application in various bioinformatics platforms such as IMG [10], MicroScope [11,12], the SEED [13] and ERGO [14]

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS computational biology	Publication Date: May 31, 2012
Citations: 72	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The CanOE Strategy: Integrating Genomic and Metabolic Contexts across Multiple Prokaryote Genomes to Find Candidate Genes for Orphan Enzymes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology

Lead the way for us

Similar Papers

Prediction and identification of sequences coding for orphan enzymes using genomic and metagenomic neighbours
Takuji Yamada ... Kiran R Patil
Molecular systems biology | VOL. 8
Takuji Yamada, et. al.Takuji Yamada ... Kiran R Patil
01 Jan 2012
Molecular systems biology | VOL. 8

ACE gene, physical activity, and physical fitness.
Stephen H Day ... Sukhbir Dhamrait
Journal of applied physiology: respiratory, environmental and exercise physiology | VOL. 93
Stephen H Day, et. al.Stephen H Day ... Sukhbir Dhamrait
01 Oct 2002
Journal of applied physiology: respiratory, environmental and exercise physiology | VOL. 93

Profiling the orphan enzymes
Maria Sorokina ... Mark Stam
Biology Direct | VOL. 9
Maria Sorokina, et. al.Maria Sorokina ... Mark Stam
01 Jan 2014
Biology Direct | VOL. 9

Separation of genetic functions controlling organ identity in flowers.
Emma Keck ... Paula Mcsteen
The EMBO Journal | VOL. 22
Emma Keck, et. al.Emma Keck ... Paula Mcsteen
03 Mar 2003
The EMBO Journal | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The CanOE Strategy: Integrating Genomic and Metabolic Contexts across Multiple Prokaryote Genomes to Find Candidate Genes for Orphan Enzymes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology