Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data

Gabin Drouard,Joona Pohjonen,Matti Pirinen,Jarkko Heiskanen,Jaakko Kaprio,Katja Pahkala,Juha Mykkänen,Olli Raitakari,Saku Ruohonen,Samuli Ripatti,Xiaoling Wang,Miina Ollikainen,Terho Lehtimäki

doi:10.1186/s12911-024-02521-3

Abstract

BackgroundMachine learning (ML) classifiers are increasingly used for predicting cardiovascular disease (CVD) and related risk factors using omics data, although these outcomes often exhibit categorical nature and class imbalances. However, little is known about which ML classifier, omics data, or upstream dimension reduction strategy has the strongest influence on prediction quality in such settings. Our study aimed to illustrate and compare different machine learning strategies to predict CVD risk factors under different scenarios.MethodsWe compared the use of six ML classifiers in predicting CVD risk factors using blood-derived metabolomics, epigenetics and transcriptomics data. Upstream omic dimension reduction was performed using either unsupervised or semi-supervised autoencoders, whose downstream ML classifier performance we compared. CVD risk factors included systolic and diastolic blood pressure measurements and ultrasound-based biomarkers of left ventricular diastolic dysfunction (LVDD; E/e' ratio, E/A ratio, LAVI) collected from 1,249 Finnish participants, of which 80% were used for model fitting. We predicted individuals with low, high or average levels of CVD risk factors, the latter class being the most common. We constructed multi-omic predictions using a meta-learner that weighted single-omic predictions. Model performance comparisons were based on the F1 score. Finally, we investigated whether learned omic representations from pre-trained semi-supervised autoencoders could improve outcome prediction in an external cohort using transfer learning.ResultsDepending on the ML classifier or omic used, the quality of single-omic predictions varied. Multi-omics predictions outperformed single-omics predictions in most cases, particularly in the prediction of individuals with high or low CVD risk factor levels. Semi-supervised autoencoders improved downstream predictions compared to the use of unsupervised autoencoders. In addition, median gains in Area Under the Curve by transfer learning compared to modelling from scratch ranged from 0.09 to 0.14 and 0.07 to 0.11 units for transcriptomic and metabolomic data, respectively.ConclusionsBy illustrating the use of different machine learning strategies in different scenarios, our study provides a platform for researchers to evaluate how the choice of omics, ML classifiers, and dimension reduction can influence the quality of CVD risk factor predictions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC medical informatics and decision making	Publication Date: May 2, 2024
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data

Abstract

Talk to us

Similar Papers

More From: BMC medical informatics and decision making

Lead the way for us

Similar Papers

Effect of body mass index‐z score on adverse levels of cardiovascular disease risk factors
Keisuke Katsuren ... Takao Ohta
Pediatrics International | VOL. 54
Keisuke Katsuren, et. al.Keisuke Katsuren ... Takao Ohta
22 Dec 2011
Pediatrics International | VOL. 54

Association of anthropometric indices with cardiovascular disease risk factors among children and adolescents: CASPIAN Study
Roya Kelishadi ... Mohammad Mehdi Riazi
International Journal of Cardiology | VOL. 117
Roya Kelishadi, et. al.Roya Kelishadi ... Mohammad Mehdi Riazi
21 Jul 2006
International Journal of Cardiology | VOL. 117

Effects of Cardiorespiratory Fitness on Cardiovascular Disease Risk Factors and Telomere Length by Age and Obesity.
Yun-A Shin ... Jae-Hyun Kim
Journal of obesity & metabolic syndrome | VOL. 32
Yun-A Shin, et. al.Yun-A Shin ... Jae-Hyun Kim
30 Sep 2023
Journal of obesity & metabolic syndrome | VOL. 32

Unique Cardiovascular Disease Risk Factors in Hispanic Individuals.
Sofia Gomez ... Vanessa Blumer
Current Cardiovascular Risk Reports | VOL. 16
Sofia Gomez, et. al.Sofia Gomez ... Vanessa Blumer
02 Jun 2022
Current Cardiovascular Risk Reports | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data

Abstract

Talk to us

Similar Papers

More From: BMC medical informatics and decision making