Abstract

Disease causing gene identification is considered as an important step towards drug design and drug discovery. In disease gene identification and classification, the main aim is to identify disease genes while identifying non-disease genes are of less or no significant. Hence, this task can be defined as a one-class classification problem. Existing machine learning methods typically take into consideration known disease genes as positive training set and unknown genes as negative samples to build a binary-class classification model. Here we propose a new One-class Classification Support Vector Machines (OCSVM) method to precisely classify candidate disease genes. Our aim is to build a model that concentrate its focus on detecting known disease-causing gene to increase sensitivity and precision. We investigate the impact of our proposed model using a benchmark consisting of the gene expression dataset for Acute Myeloid Leukemia (AML) cancer. Compared with the traditional methods, our experimental result shows the superiority of our proposed method in terms of precision, recall, and F-measure to detect disease causing genes for AML. OCSVM codes and our extracted AML benchmark are publicly available at: https://github.com/imandehzangi/OCSVM.

Highlights

  • In medicine and pharmacology, it is crucial to understand the mechanism of a disease in order to find an effective treatment method

  • The main issue with such studies is not having a specific technique to retrieve validated negative data from unlabeled samples to produce reliable result. To overcome this limitation, here we propose a novel machine learning method to accurately predict disease causing genes in Acute Myeloid Leukemia (AML) based upon the concept of one-class classification using gene expression data

  • We compare the results achieved from traditional two-class classifiers with our new one-class classification model

Read more

Summary

Introduction

It is crucial to understand the mechanism of a disease in order to find an effective treatment method. When dealing with the inherent disorders, finding the disease genes is the first step. Genetic disorders occur due to dysfunction or disease-causing mutations in a single gene or group of genes. Finding disease-related genes experimentally is a time taking process due to the large number of genes. Further biological findings rely on the computational approaches to accelerate experiments to predict novel disease genes.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call