Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning.

Alena Orlenko,Pekka Kuukasjärvi,Pekka J Karhunen,Mika Kähönen,Daniel Kofink,Leo-Pekka Lyytikäinen,Kjell Nikus,Jari O Laurikka,Jason H Moore,Terho Lehtimäki,Folkert W Asselbergs,Pashupati Mishra,Janet Kelso

doi:10.1093/bioinformatics/btz796

Abstract

MotivationSelecting the optimal machine learning (ML) model for a given dataset is often challenging. Automated ML (AutoML) has emerged as a powerful tool for enabling the automatic selection of ML methods and parameter settings for the prediction of biomedical endpoints. Here, we apply the tree-based pipeline optimization tool (TPOT) to predict angiographic diagnoses of coronary artery disease (CAD). With TPOT, ML models are represented as expression trees and optimal pipelines discovered using a stochastic search method called genetic programing. We provide some guidelines for TPOT-based ML pipeline selection and optimization-based on various clinical phenotypes and high-throughput metabolic profiles in the Angiography and Genes Study (ANGES).ResultsWe analyzed nuclear magnetic resonance-derived lipoprotein and metabolite profiles in the ANGES cohort with a goal to identify the role of non-obstructive CAD patients in CAD diagnostics. We performed a comparative analysis of TPOT-generated ML pipelines with selected ML classifiers, optimized with a grid search approach, applied to two phenotypic CAD profiles. As a result, TPOT-generated ML pipelines that outperformed grid search optimized models across multiple performance metrics including balanced accuracy and area under the precision-recall curve. With the selected models, we demonstrated that the phenotypic profile that distinguishes non-obstructive CAD patients from no CAD patients is associated with higher precision, suggesting a discrepancy in the underlying processes between these phenotypes.Availability and implementationTPOT is freely available via http://epistasislab.github.io/tpot/.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

Predictive analysis in biomedical research is typically based on deriving quantitative measures of confidence through the creation and fitting of a hypothesis-specific probability model, machine learning (ML)-based algorithms offers a wide range of different techniques that focus on prediction, through pattern recognition learning, with minimal underlying assumptions about the features
Uncertainty in ML model selection comes from the number of various pre-processing algorithms such as, feature selectors and feature transformers [group of computational algorithms which provides transformation of the dataset with feature pre-processing, reduction of dimensionality of the feature set, or generation of new feature(s) from existing ones] that might be needed to enrich the data for signal
3.1 Model selection with tree-based pipeline optimization tool (TPOT) Table 1(A) outlines the summary of the comparative analysis of model selection from the TPOT optimization process and grid search parameter tuning for P1 phenotype

Summary

Introduction

Predictive analysis in biomedical research is typically based on deriving quantitative measures of confidence through the creation and fitting of a hypothesis-specific probability model, machine learning (ML)-based algorithms offers a wide range of different techniques that focus on prediction, through pattern recognition learning, with minimal underlying assumptions about the features. ML is especially effective when features are involved in nonlinear interactions or when no strong scientific hypothesis about feature interactions is established. Automated ML (AutoML) seeks to take the guesswork out of this process by treating ML algorithms and pre-processing methods as building blocks for pipelines that are constructed and evaluated using a search algorithm

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Nov 8, 2019
Citations: 42	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Hyperparameter Tuning and Pipeline Optimization via Grid Search Method and Tree-Based AutoML in Breast Cancer Prediction.
Siti Fairuz Mat Radzi ... Mohd Amiruddin Abd Rahman
Journal of Personalized Medicine | VOL. 11
Siti Fairuz Mat Radzi, et. al.Siti Fairuz Mat Radzi ... Mohd Amiruddin Abd Rahman
29 Sep 2021
Journal of Personalized Medicine | VOL. 11

Genetic Analysis of Coronary Artery Disease Using Tree-Based Automated Machine Learning Informed By Biology-Based Feature Selection.
Elisabetta Manduchi ... Trang T Le
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 19
Elisabetta Manduchi, et. al.Elisabetta Manduchi ... Trang T Le
01 May 2022
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 19

Noninvasive Diagnosis of Coronary Artery Disease in Patients With Diabetes by Dobutamine Stress Real-Time Myocardial Contrast Perfusion Imaging
Abdou Elhendy ... Anna C Mcgrain
Diabetes Care | VOL. 28
Abdou Elhendy, et. al.Abdou Elhendy ... Anna C Mcgrain
27 Jun 2005
Diabetes Care | VOL. 28

FLOW PATTERN PREDICTION IN HORIZONTAL AND INCLINED PIPES USING TREE-BASED AUTOMATED MACHINE LEARNING
Agash Uthayasuriyan ... Jeyakumar Gurusamy
Rudarsko-geološko-naftni zbornik | VOL. 39
Agash Uthayasuriyan, et. al.Agash Uthayasuriyan ... Jeyakumar Gurusamy
01 Jan 2024
Rudarsko-geološko-naftni zbornik | VOL. 39

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics