Abstract

In light of recent advancements in machine learning, personalized medicine using predictive algorithms serves as an essential paradigmatic methodology. Our goal was to explore an integrated machine learning and genome-wide analysis approach which targets the prediction of probable major depressive disorder (MDD) using 9828 individuals in the Taiwan Biobank. In our analysis, we reported a genome-wide significant association with probable MDD that has not been previously identified: FBN1 on chromosome 15. Furthermore, we pinpointed 17 single nucleotide polymorphisms (SNPs) which show evidence of both associations with probable MDD and potential roles as expression quantitative trait loci (eQTLs). To predict the status of probable MDD, we established prediction models with random undersampling and synthetic minority oversampling using 17 eQTL SNPs and eight clinical variables. We utilized five state-of-the-art models: logistic ridge regression, support vector machine, C4.5 decision tree, LogitBoost, and random forests. Our data revealed that random forests had the highest performance (area under curve = 0.8905 ± 0.0088; repeated 10-fold cross-validation) among the predictive algorithms to infer complex correlations between biomarkers and probable MDD. Our study suggests that an integrated machine learning and genome-wide analysis approach may offer an advantageous method to establish bioinformatics tools for discriminating MDD patients from healthy controls.

Highlights

  • Significant progress has been made in the interdisciplinary fields of personalized medicine, machine learning, and psychiatry in recent years [1,2,3]

  • We found that our random forests model with synthetic minority oversampling showed the best performance in predicting probable major depressive disorder (MDD) based on 17 eQTL single nucleotide polymorphisms (SNPs) and eight clinical variables

  • We investigated the association between probable MDD and key eQTL SNPs assessed in the genome-wide association studies (GWASs) study

Read more

Summary

Introduction

Significant progress has been made in the interdisciplinary fields of personalized medicine, machine learning, and psychiatry in recent years [1,2,3]. Machine learning models have been investigated to develop predictive algorithms that can help facilitate studies of how genetic variants and clinical variables can impact disease status and treatment outcomes in patients [1,2,3]. As well as disease status in patients with schizophrenia [6,7] using clinical characteristics and genetic variants such as single nucleotide polymorphisms (SNPs) Due to their wide range of potential applications, it has been suggested that machine learning models can play a pivotal role in the future of personalized medicine [8,9,10]. Qi et al [13] demonstrated an extreme gradient boosting machine learning method to predict the severity of MDD using microRNA expression data (AUC = 0.76). A recent study by Arloth et al [16] reported an integrated machine learning and genome-wide analysis approach to identify regulatory SNPs which are associated with MDD using expression quantitative trait loci (eQTLs) and methylation quantitative trait loci information

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.