The ability to classify patients based on gene-expression data varies by algorithm and performance metric.

Stephen R Piccolo,Avery Mecham,Dustin B Miller,Nathan P Golightly,Jérémie L Johnson

doi:10.1371/journal.pcbi.1009926

Stephen R Piccolo, Avery Mecham + Show 3 more

Open Access

https://doi.org/10.1371/journal.pcbi.1009926

Copy DOI

Journal: PLOS Computational Biology	Publication Date: Mar 11, 2022
Citations: 9	License type: CC BY 4.0

Affiliation: Brigham Young University

Abstract

By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include patients with a particular disease subtype, patients with a good (or poor) prognosis, or patients most (or least) likely to respond to a particular therapy. Transcriptomic measurements reflect the downstream effects of genomic and epigenomic variations. However, high-throughput technologies generate thousands of measurements per patient, and complex dependencies exist among genes, so it may be infeasible to classify patients using traditional statistical models. Machine-learning classification algorithms can help with this problem. However, hundreds of classification algorithms exist-and most support diverse hyperparameters-so it is difficult for researchers to know which are optimal for gene-expression biomarkers. We performed a benchmark comparison, applying 52 classification algorithms to 50 gene-expression datasets (143 class variables). We evaluated algorithms that represent diverse machine-learning methodologies and have been implemented in general-purpose, open-source, machine-learning libraries. When available, we combined clinical predictors with gene-expression data. Additionally, we evaluated the effects of performing hyperparameter optimization and feature selection using nested cross validation. Kernel- and ensemble-based algorithms consistently outperformed other types of classification algorithms; however, even the top-performing algorithms performed poorly in some cases. Hyperparameter optimization and feature selection typically improved predictive performance, and univariate feature-selection algorithms typically outperformed more sophisticated methods. Together, our findings illustrate that algorithm performance varies considerably when other factors are held constant and thus that algorithm selection is a critical step in biomarker studies.

Highlights

Source code can be found at https://github.com/srp33/ ShinyLearner
Previous benchmarks have not systematically evaluated the benefits of optimizing an algorithm’s hyperparameters versus using defaults. We address these gaps with a benchmark study spanning 50 datasets (143 class variables representing diverse phenotypes), 52 classification algorithms (1116 hyperparameter combinations), and 14 feature-selection algorithms
We evaluated the predictive performance of 52 classification algorithms on 50 gene-expression datasets

Summary

Introduction

Researchers use observational data to derive categories, or classes, into which patients can be assigned. Such classes might include patients who have a given disease subtype, patients at a particular disease stage, patients who respond to a particular treatment, patients who have poor outcomes, patients who have a particular genomic lesion, etc. A key challenge is defining objective and reliable criteria for assigning individual patients to known class labels. When such criteria have been identified and sufficiently validated, they can be used in medical “expert systems” for classifying individual patients[4]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The ability to classify patients based on gene-expression data varies by algorithm and performance metric.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Similar Papers

The ability to classify patients based on gene-expression data varies by algorithm and performance metric
Dustin B Miller ... Edwin Wang
-
Dustin B Miller, et. al.Dustin B Miller ... Edwin Wang
11 Mar 2022
11 Mar 2022

Are Machine Learning Algorithms More Accurate in Predicting Vegetable and Fruit Consumption Than Traditional Statistical Models? An Exploratory Analysis
Mélina Côté ... Julie Robitaille
Frontiers in Nutrition | VOL. 9
Mélina Côté, et. al.Mélina Côté ... Julie Robitaille
17 Feb 2022
Frontiers in Nutrition | VOL. 9

Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.
Timo M Deist ...
Medical Physics | VOL. 45
Timo M Deist, et. al.Timo M Deist ...
13 Jun 2018
Medical Physics | VOL. 45

Optimizing the Hyperparameter of Feature Extraction and Machine Learning Classification Algorithms
Sani Muhammad Isa ... Rizaldi Suwandi
International Journal of Advanced Computer Science and Applications | VOL. 10
Sani Muhammad Isa, et. al.Sani Muhammad Isa ... Rizaldi Suwandi
01 Jan 2019
International Journal of Advanced Computer Science and Applications | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The ability to classify patients based on gene-expression data varies by algorithm and performance metric.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology