Abstract

BackgroundDetermination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms.ResultsWe first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction.ConclusionsFWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.

Highlights

  • Determination of the minimum gene set for cellular life is one of the central goals in biology

  • Relationship of gene features and gene essentiality Selecting features associated with gene essentiality is fundamental to predict essential genes in feature-based models

  • To illustrate the possible consequences of different features in essential gene prediction, we investigate the linkages between gene essentiality and gene features in the Saccharomyces cerevisiae (SCE, Table 2A) and Escherichia coli (ECO, Table 2B) genomes, using the stepwise regression model (SRM) combined with forward selection

Read more

Summary

Introduction

Determination of the minimum gene set for cellular life is one of the central goals in biology. Enhanced knowledge of essential genes promotes an understanding of the primary structure of the complex gene regulatory network in a cell [3,4,5] and helps elucidate the relationship between genotype and phenotype [6,7], identify human diseases [8], Two types of approaches are mainly used to predict and identify essential genes: experimental laboratory techniques and computational techniques The former is randomly or systematically used to inactivate potential essential genes, and gene essentiality could be determined based on the living situation of the organism. The spectrum of gene essentiality varies under different growth conditions [6,17]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call