Abstract

Identifying essential genes in a given organism is important for research on their fundamental roles in organism survival. Furthermore, if possible, uncovering the links between core functions or pathways with these essential genes will further help us obtain deep insight into the key roles of these genes. In this study, we investigated the essential and non-essential genes reported in a previous study and extracted gene ontology (GO) terms and biological pathways that are important for the determination of essential genes. Through the enrichment theory of GO and KEGG pathways, we encoded each essential/non-essential gene into a vector in which each component represented the relationship between the gene and one GO term or KEGG pathway. To analyze these relationships, the maximum relevance minimum redundancy (mRMR) was adopted. Then, the incremental feature selection (IFS) and support vector machine (SVM) were employed to extract important GO terms and KEGG pathways. A prediction model was built simultaneously using the extracted GO terms and KEGG pathways, which yielded nearly perfect performance, with a Matthews correlation coefficient of 0.951, for distinguishing essential and non-essential genes. To fully investigate the key factors influencing the fundamental roles of essential genes, the 21 most important GO terms and three KEGG pathways were analyzed in detail. In addition, several genes was provided in this study, which were predicted to be essential genes by our prediction model. We suggest that this study provides more functional and pathway information on the essential genes and provides a new way to investigate related problems.

Highlights

  • As is known, genes are the basic molecular unit of heredity

  • The results showed that the gene expression programming-based (GEP) classifier outperformed other methods using individual features and received a better AUC score than most of the classifiers trained by various machine learning algorithms [17]

  • It is impossible that several gene ontology (GO) terms and KEGG pathways can indicate the differences between essential and non-essential genes

Read more

Summary

Introduction

The functions of genes have been widely reported to be redundant and reduplicative [1, 2]. Some genes have turned out to be significant for survival, while others seem to be not necessary. To distinguish these two groups of genes and identify the core heretical regulatory factors, a new concept, named essential genes, has been presented. Essential genes refer to a group of fundamental genes necessary for a specific organism to survive in a specific environment [3]. Based on two reliable and widely quoted literatures, essential genes refer to sets of genes that are absolutely required for indispensable for the viability of individual human cell types [4, 5]. Essential genes convey fewer selective advantages and may have decreased fitness, escaping from the natural selection

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call