Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information

Marcio L Acencio,Ney Lemke

doi:10.1186/1471-2105-10-290

Marcio L Acencio, Ney Lemke

Open Access

https://doi.org/10.1186/1471-2105-10-290

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Sep 16, 2009
Citations: 241	License type: CC BY 2.0

Affiliation: Universidade Estadual Paulista (Unesp)

Abstract

BackgroundThe identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. We therefore present here a machine learning-based computational approach relying on network topological features, cellular localization and biological process information for prediction of essential genes.ResultsWe constructed a decision tree-based meta-classifier and trained it on datasets with individual and grouped attributes-network topological features, cellular compartments and biological processes-to generate various predictors of essential genes. We showed that the predictors with better performances are those generated by datasets with integrated attributes. Using the predictor with all attributes, i.e., network topological features, cellular compartments and biological processes, we obtained the best predictor of essential genes that was then used to classify yeast genes with unknown essentiality status. Finally, we generated decision trees by training the J48 algorithm on datasets with all network topological features, cellular localization and biological process information to discover cellular rules for essentiality. We found that the number of protein physical interactions, the nuclear localization of proteins and the number of regulating transcription factors are the most important factors determining gene essentiality.ConclusionWe were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing essentiality.

Highlights

The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design
We were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes
We found that the integration of network topology, cellular localization and biological process information in a single predictor increased the predictability of essential genes in comparison with the predictor containing only network topological features

Summary

Introduction

The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. The experimental techniques for essential genes discovery are labor-intensive and time-consuming Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. The prediction and discovery of essential genes have been performed by experimental procedures such as single gene knockouts [4], RNA interference [5] and conditional knockouts [6], but these techniques require a large investment of time and resources and they are not always feasible. Considering these experimental constraints, a computational approach capable of accurately predict essential genes would be of great value

Methods

Results

Conclusion