Abstract
Disease causing gene identification is considered as an important step towards drug design and drug discovery. In disease gene identification and classification, the main aim is to identify disease genes while identifying non-disease genes are of less or no significant. Hence, this task can be defined as a one-class classification problem. Existing machine learning methods typically take into consideration known disease genes as positive training set and unknown genes as negative samples to build a binary-class classification model. Here we propose a new One-class Classification Support Vector Machines (OCSVM) method to precisely classify candidate disease genes. Our aim is to build a model that concentrate its focus on detecting known disease-causing gene to increase sensitivity and precision. We investigate the impact of our proposed model using a benchmark consisting of the gene expression dataset for Acute Myeloid Leukemia (AML) cancer. Compared with the traditional methods, our experimental result shows the superiority of our proposed method in terms of precision, recall, and F-measure to detect disease causing genes for AML. OCSVM codes and our extracted AML benchmark are publicly available at: https://github.com/imandehzangi/OCSVM.
Highlights
In medicine and pharmacology, it is crucial to understand the mechanism of a disease in order to find an effective treatment method
The main issue with such studies is not having a specific technique to retrieve validated negative data from unlabeled samples to produce reliable result. To overcome this limitation, here we propose a novel machine learning method to accurately predict disease causing genes in Acute Myeloid Leukemia (AML) based upon the concept of one-class classification using gene expression data
We compare the results achieved from traditional two-class classifiers with our new one-class classification model
Summary
It is crucial to understand the mechanism of a disease in order to find an effective treatment method. When dealing with the inherent disorders, finding the disease genes is the first step. Genetic disorders occur due to dysfunction or disease-causing mutations in a single gene or group of genes. Finding disease-related genes experimentally is a time taking process due to the large number of genes. Further biological findings rely on the computational approaches to accelerate experiments to predict novel disease genes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.