A machine learning framework for performing binary classification on tabular biomedical data

Ádám Szijártó,Bálint Károly Lakatos,Márton Tokodi,Máté Tolvaj,Attila Kovács,Alexandra Fábián,Béla Merkely

doi:10.1556/1647.2023.00109

Ádám Szijártó, Bálint Károly Lakatos + Show 5 more

Open Access

https://doi.org/10.1556/1647.2023.00109

Copy DOI

Journal: Imaging	Publication Date: Jun 26, 2023
Citations: 1	License type: CC BY 4.0

Affiliation: Semmelweis University

Abstract

AbstractBackground and aimOver the past decades, we have witnessed an immense expansion in the arsenal and performance of machine learning (ML) algorithms. One of the most important fields that could benefit from these advancements is biomedical science. To streamline the training and evaluation of binary classifiers, we constructed a universal and flexible ML framework that uses tabular biomedical data as input.Methods and resultsOur framework requires the input data to be provided as a comma-separated values file, in which rows correspond to subjects and columns represent different features. After reading the content of this file, the framework enables the users to perform outlier detection, handle missing values, rescale features, and tackle class imbalance. Then, hyperparameter tuning, feature selection, and internal validation are performed using nested cross-validation. If an additional dataset is available, the framework also provides the option for external validation. Users may also compute SHapley Additive exPlanations values to interpret the individual predictions of the model and identify the most important features. Our ML framework was implemented in Python (version 3.9), and its source code is freely available via GitHub. In the second part of this paper, we also demonstrate the usage of the framework through a case study from the field of cardiovascular imaging.ConclusionsThe proposed ML framework enables the efficient training and evaluation of binary classifiers on tabular biomedical data. We hope our framework will serve as a useful resource for both learning and research purposes and will promote further innovation.

Full Text