Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology.

Muhammad Asif,Hugo F M C M Martiniano,Francisco M Couto,Astrid M Vicente

doi:10.1371/journal.pone.0208626

Abstract

Identifying disease genes from a vast amount of genetic data is one of the most challenging tasks in the post-genomic era. Also, complex diseases present highly heterogeneous genotype, which difficult biological marker identification. Machine learning methods are widely used to identify these markers, but their performance is highly dependent upon the size and quality of available data. In this study, we demonstrated that machine learning classifiers trained on gene functional similarities, using Gene Ontology (GO), can improve the identification of genes involved in complex diseases. For this purpose, we developed a supervised machine learning methodology to predict complex disease genes. The proposed pipeline was assessed using Autism Spectrum Disorder (ASD) candidate genes. A quantitative measure of gene functional similarities was obtained by employing different semantic similarity measures. To infer the hidden functional similarities between ASD genes, various types of machine learning classifiers were built on quantitative semantic similarity matrices of ASD and non-ASD genes. The classifiers trained and tested on ASD and non-ASD gene functional similarities outperformed previously reported ASD classifiers. For example, a Random Forest (RF) classifier achieved an AUC of 0. 80 for predicting new ASD genes, which was higher than the reported classifier (0.73). Additionally, this classifier was able to predict 73 novel ASD candidate genes that were enriched for core ASD phenotypes, such as autism and obsessive-compulsive behavior. In addition, predicted genes were also enriched for ASD co-occurring conditions, including Attention Deficit Hyperactivity Disorder (ADHD). We also developed a KNIME workflow with the proposed methodology which allows users to configure and execute it without requiring machine learning and programming skills. Machine learning is an effective and reliable technique to decipher ASD mechanism by identifying novel disease genes, but this study further demonstrated that their performance can be improved by incorporating a quantitative measure of gene functional similarities. Source code and the workflow of the proposed methodology are available at https://github.com/Muh-Asif/ASD-genes-prediction.

Highlights

Complex diseases with a strong genetic influence, such as Autism Spectrum Disorder (ASD), often have multiple etiologies with the involvement of possibly hundreds of different genes
Random walk with restart (RWR) algorithm has been widely used for disease gene prediction [4,5]
We evaluated the performance of machine learning methods in predicting the ASD candidate genes

Summary

Introduction

Complex diseases with a strong genetic influence, such as Autism Spectrum Disorder (ASD), often have multiple etiologies with the involvement of possibly hundreds of different genes. The many large-scale genetic studies for ASD have identified hundreds of candidate disease [1,2]. Supervised machine learning methods trace hidden relationships among disease-causing genes in existing datasets, such as gene co-expression profiles, functional similarities, or protein-protein interactions networks; and uses this information to discriminate disease genes from non-disease genes [7,8,9]. Krishnan et al [10] reported a weighted Support Vector Machine (SVM) classifier to predict the probability of association of each brain gene with ASD. To overcome the shortcomings of network-based annotations, studies have used Gene Ontology (GO) (http://www.geneontology.org/) [11], which is a highly efficient resource for predicting disease-causing genes [12]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Dec 10, 2018
Citations: 68	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

39: Fetal DNA methylation of autism spectrum disorders (ASD) candidate genes: association with spontaneous preterm birth
Fara Behnia ... Ramkumar Menon
American Journal of Obstetrics and Gynecology | VOL. 212
Fara Behnia, et. al.Fara Behnia ... Ramkumar Menon
18 Dec 2014
American Journal of Obstetrics and Gynecology | VOL. 212

Common genetic risk factors in ASD and ADHD co-occurring families.
Anbo Zhou ... Christine Gwin
Human Genetics | VOL. 142
Anbo Zhou, et. al.Anbo Zhou ... Christine Gwin
17 Oct 2022
Human Genetics | VOL. 142

CRISPR/Cas9-mediated heterozygous knockout of the autism gene CHD8 and characterization of its transcriptional networks in cerebral organoids derived from iPS cells
Ping Wang ... Herbert M Lachman
Molecular Autism | VOL. 8
Ping Wang, et. al.Ping Wang ... Herbert M Lachman
20 Mar 2017
Molecular Autism | VOL. 8

Calculating genetic risk for dysfunction in pleiotropic biological processes using whole exome sequencing data
Olivia J Veatch ... Beth A Malow
Journal of Neurodevelopmental Disorders | VOL. 14
Olivia J Veatch, et. al.Olivia J Veatch ... Beth A Malow
24 Jun 2022
Journal of Neurodevelopmental Disorders | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE