Abstract

Experimental techniques for identification of essential genes (EGs) in prokaryotes are usually expensive, time-consuming and sometimes unrealistic. Emerging in silico methods provide alternative methods for EG prediction, but often possess limitations including heavy computational requirements and lack of biological explanation. Here we propose a new computational algorithm for EG prediction in prokaryotes with an online database (ePath) for quick access to the EG prediction results of over 4,000 prokaryotes (https://www.pubapps.vcu.edu/epath/). In ePath, gene essentiality is linked to biological functions annotated by KEGG Ortholog (KO). Two new scoring systems, namely, E_score and P_score, are proposed for each KO as the EG evaluation criteria. E_score represents appearance and essentiality of a given KO in existing experimental results of gene essentiality, while P_score denotes gene essentiality based on the principle that a gene is essential if it plays a role in genetic information processing, cell envelope maintenance or energy production. The new EG prediction algorithm shows prediction accuracy ranging from 75% to 91% based on validation from five new experimental studies on EG identification. Our overall goal with ePath is to provide a comprehensive and reliable reference for gene essentiality annotation, facilitating the study of those prokaryotes without experimentally derived gene essentiality information.

Highlights

  • Essential genes (EGs) are defined as those genes that are critical for the survival of an organism[1,2]

  • Predictions using metabolic models are constrained by the availability of the models corresponding to the organism of interest[21]. These predictions are only available for those genes involved in metabolic pathways, whereas other genes such as those involved in genetic information processing and some cell envelope maintenance genes are excluded

  • We selected 31 strains in the database of essential genes (DEG) database with corresponding essential genes (EGs) identified experimentally (Table 1). These EGs were linked to KEGG Ortholog (KO) numbers

Read more

Summary

Introduction

Essential genes (EGs) are defined as those genes that are critical for the survival of an organism[1,2]. Available experimental results are derived from different methods in different instances and are more reliable for model organisms such as Escherichia coli and Bacillus subtilis. Generating these outcomes for other organisms is not a simple task. Machine learning algorithms for EG prediction require existing gene essentiality information derived from laboratory experiments[18] and extensive computational resources. They may show relatively high predictive power within their training sets, the general application of these tools remains largely uncertain outside their data domain.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call