Abstract

To recognize the basis of disease, it is essential to determine its underlying genes. Understanding the association between underlying genes and genetic disease is a fundamental problem regarding human health. Identification and association of genes with the disease require time consuming and expensive experimentations of a great number of potential candidate genes. Therefore, the alternative inexpensive and rapid computational methods have been proposed that can identify the candidate gene associated with a disease. Most of these methods use phenotypic similarities due to the fact that genes causing same or similar diseases have less variation in their sequence or network properties of protein-protein interactions based on-premises that genes lie closer in protein interaction network that causes the similar or same disease. However, these methods use only basic network properties or topological features and gene sequence information or biological features as a prior knowledge for identification of gene-disease association, which restricts the identification process to a single gene-disease association. In this study, we propose and analyze some novel computational methods for the identification of genes associated with diseases. Some advance topological and biological features that are overlooked currently are introducing for identifying candidate genes. We evaluate different computational methods on disease-gene association data from DisGeNET in a 10-fold cross-validation mode based on TP rate, FP rate, precision, recall, F-measure, and ROC curve evaluation parameters. The results reveal that various computational methods with advanced feature set outperform previous state-of-the-art techniques by achieving precision up to 93.8%, recall up to 93.1%, and F- measure up to 92.9%. Significantly, we apply our methods to study four major diseases: Thalassemia, Diabetes, Malaria, and Asthma. Simulation results show that the proposed Deep Extreme Learning Machine (DELM) gives more accurate results as compared to previously published approaches.

Highlights

  • A gene is the basic physical and functional unit of heredity that is responsible for different biological processes in an organism

  • In this study, we proposed and analyze some novel computational methods for the identification of genes associated with diseases based on some advanced biological features

  • We test our data by using only biological features and it is shown in Table 3 that Random forest, Classification Via Regression, and Simple cart outperforms by achieving TP rate up to 93.1%, FP rate up to 11.8%, precision up to 93.8%, recall up to 93.1%, F-measure up to 92.9%, and ROC area 99.1%

Read more

Summary

INTRODUCTION

A gene is the basic physical and functional unit of heredity that is responsible for different biological processes in an organism. They developed a method named PRINCE based on the prioritization function and its constraints that relate to its uniformity over the usage of prior information and network They used the technique that predicts the gene associations but correspondingly protein complex associations by the disease of concern [10]. [14] used only degree connectivity and betweenness centrality in their computational pipeline for the prioritization of genes associated with the disease They do not utilize some other topological features which we proposed and achieve remarkable results. To analyze different computational techniques, the known disease genes are downloaded from DisGeNET, sequences of genes from UniProt, the binary protein interactions from HPRD (Human protein reference database), and the true human protein complexes are from Comprehensive Resource of Mammalian protein complexes (CORUM) By using all these data resources, we have extracted different biological and topological features.

MATERIAL AND METHODS
FEATURE EXTRACTION
Topological Features
PROPOSED DEEP EXTREME LEARNING MACHINE FRAMEWORK
RESULTS AND DISCUSSION
COMPARITIVE ANALYSIS OF COMPUTATIONAL COST OF NOVEL COMPUTATIONAL METHODS
VIII. CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call