IEsGene-ZCPseKNC: Identify Essential Genes Based on Z Curve Pseudo $k$ -Tuple Nucleotide Composition

Jiahai Chen,Qing Liao,Yongmin Liu,Bin Liu

doi:10.1109/access.2019.2952237

Jiahai Chen, Qing Liao + Show 2 more

Open Access

PDF Available

https://doi.org/10.1109/access.2019.2952237

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

As an important technique for synthetic biology, computational identification of essential genes will facilitate the development of the related fields, such as genome analysis, drug design, etc. The identification of prokaryotic essential genes has been extensively studied, especially focusing on the essential genes in bacteria. Archaea as an important domain in prokaryote exists high variance of genome sizes. However, there is no predictor available for predicting essential genes in archaea. In this paper, we developed the first computational predictor for predicting essential genes in archaea called iEsGene-ZCPseKNC. With the purpose of capturing sequence patterns of the essential genes, a new feature called Z curve pseudo $k$ -tuple nucleotide composition (ZCPseKNC) was proposed, which incorporates the advantages of both Z curve and pseudo $k$ -tuple nucleotide composition (PseKNC). In order to overcome the problems caused by the imbalanced training set, the SMOTE algorithm was employed to further improve the predictive performance of iEsGene-ZCPseKNC. Evaluated by the rigorous jackknife test on a benchmark dataset, the experimental results showed that the iEsGene-ZCPseKNC predictor outperformed the predictors based on Z curve and PseKNC, indicating that iEsGene-ZCPseKNC is useful for identification of essential genes in archaea, and would be a powerful tool for genome analysis. A user friendly web server of the iEsGene-ZCPseKNC predictor was established and can be easily accessed from http://bliulab.net/iEsGene-ZCPseKNC/ .

Highlights

Essential genes and their encoded functions are significantly necessary for the survival of an organism [1]
The essential genes are very important for the synthetic biology, because the essential genes are the foundation of genome construction [3]
Provided a wide range of biological features for the essential gene prediction, including network topology information [4]–[6], homology information [7], [8], gene expression information [9], [10], cell localization [5], functional domain [10], etc. These features were combined with some state-of-the-art classifiers to construct the predictors, such as Support Vector Machines (SVMs) [6], decision tree [4], Naïve Bayes [5] and Non-negative Matrix Factorization (NMF), etc

Summary

INTRODUCTION

Essential genes and their encoded functions are significantly necessary for the survival of an organism [1]. Provided a wide range of biological features for the essential gene prediction, including network topology information [4]–[6], homology information [7], [8], gene expression information [9], [10], cell localization [5], functional domain [10], etc These features were combined with some state-of-the-art classifiers to construct the predictors, such as Support Vector Machines (SVMs) [6], decision tree [4], Naïve Bayes [5] and Non-negative Matrix Factorization (NMF), etc. There is no computational method for predicting the essential genes in archaea In this regard, in this study we are to propose the first predictor to identify the essential genes in the archaea only based on the DNA sequence composition information.

FEATURE FUSION AND SELECTION

Findings

RESULT