Abstract

Accurately predicting essential genes is important in many aspects of biology, medicine and bioengineering. In previous research, we have developed a machine learning based integrative algorithm to predict essential genes in bacterial species. This algorithm lends itself to two approaches for predicting essential genes: learning the traits from known essential genes in the target organism, or transferring essential gene annotations from a closely related model organism. However, for an understudied microbe, each approach has its potential limitations. The first is constricted by the often small number of known essential genes. The second is limited by the availability of model organisms and by evolutionary distance. In this study, we aim to determine the optimal strategy for predicting essential genes by examining four microbes with well-characterized essential genes. Our results suggest that, unless the known essential genes are few, learning from the known essential genes in the target organism usually outperforms transferring essential gene annotations from a related model organism. In fact, the required number of known essential genes is surprisingly small to make accurate predictions. In prokaryotes, when the number of known essential genes is greater than 2% of total genes, this approach already comes close to its optimal performance. In eukaryotes, achieving the same best performance requires over 4% of total genes, reflecting the increased complexity of eukaryotic organisms. Combining the two approaches resulted in an increased performance when the known essential genes are few. Our investigation thus provides key information on accurately predicting essential genes and will greatly facilitate annotations of microbial genomes.

Highlights

  • Essential genes are defined as those that, when disrupted, confer a lethal phenotype to microorganisms under defined conditions

  • Because in our previous research we have shown that our cross-organism approach outperforms homology mapping [24], we did not compare homology mapping in this study

  • Among the total characteristic features that we considered, we have identified 13 that are potentially associated with gene essentiality in Escherichia coli (EC) with relatively weak correlations among themselves (Table 1)

Read more

Summary

Introduction

Essential genes are defined as those that, when disrupted, confer a lethal phenotype to microorganisms under defined conditions. The essentiality of a gene is the indispensability of this gene’s product to the survival of a microorganism. A complete understanding of gene essentiality is important in multiple facets of biology, medicine and bioengineering. Because of the lethal consequences of their disruption, essential genes are often attractive targets of antibiotics [1]. Essential genes of an organism constitute its minimal gene set, a key concept in the emerging field of synthetic biology [2,3]. Studying gene essentiality is a crucial step toward unraveling the complex relationship between genotype and phenotype [4], a fundamental question in genetics

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call