Learning curves for drug response prediction in cancer cell lines

Alexander Partin,Yvonne A Evrard,Michael Fonstein,Austin Clyde,Rick L Stevens,Fangfang Xia,Thomas Brettin,Yitan Zhu,Hyunseung Yoo,Songhao Jiang,Maulik Shukla,James H Doroshow

doi:10.1186/s12859-021-04163-y

Alexander Partin, Yvonne A Evrard + Show 10 more

Open Access

PDF Available

https://doi.org/10.1186/s12859-021-04163-y

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundMotivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data.MethodsWe utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models.ResultsThe curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics.ConclusionsA fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.

Highlights

Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment
A single experiment refers to the workflow of generating Raw learning curve data (LCraw) and fitting the power law expression in Eq (1) to y, q0.1, and q0.9, for a pair of a dataset and an ML model
Both neural networks (NNs) maintain data scaling properties that are characterized by the power law region, demonstrating a promising trajectory of further improvement. These observations indicate that the power law fits can be used to project the expected error score beyond the available training size or, alternatively, calculate the sample size required to achieve a specific performance. These uses of learning curves can aid in collaboration between experimental biologists and computational scientists to shape a global vision of how predictive models can be further improved

Summary

Introduction

Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. The standardized protocols of sensitivity assays, along with rapid improvement of technologies for genomic profiling, have led researchers to generate large pharmacogenomic drug response datasets for anticancer drug discovery [4,5,6]. Considering the scale and diversity of tumors and compounds in these datasets, machine learning (ML) techniques have become a natural fit for analytically predicting the response of cell lines to drug treatments. By maneuvering through a landscape of computational approaches and numerical representations of tumors and drugs, researchers strive to develop highly predictive ML drug response models [7,8,9]. Demonstrating the accuracy and robustness of prediction models is essential in order to identify their potential utility for clinical applications in cancer treatment including precision oncology and drug repurposing

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 17, 2021
Citations: 17	License type: open-access

R Discovery Prime

Learning curves for drug response prediction in cancer cell lines

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Machine Learning for Predicting Micro- and Macrovascular Complications in Individuals With Prediabetes or Diabetes: Retrospective Cohort Study.
Simon Schallmoser ... Stefan Feuerriegel
Journal of Medical Internet Research | VOL. 25
Simon Schallmoser, et. al.Simon Schallmoser ... Stefan Feuerriegel
27 Feb 2023
Journal of Medical Internet Research | VOL. 25

Unleashing the Power of Machine Learning to Predict Myocardial Recovery After Left Ventricular Assist Device: A Call for the Inclusion of Unstructured Data Sources in Heart Failure Registries.
Ramsey M Wehbe
Circulation. Heart failure | VOL. 15
Ramsey M WehbeRamsey M Wehbe
24 Dec 2021
Circulation. Heart failure | VOL. 15

The Essential Tools of Scientific Machine Learning (Scientific ML)
Christopher Rackauckas
-
Christopher RackauckasChristopher Rackauckas
20 Aug 2019
20 Aug 2019

Machine Learning for Predicting Postoperative Atrial Fibrillation After Cardiac Surgery: A Scoping Review of Current Literature
Adham H El-Sherbini ... Mohammad El-Diasty
The American Journal of Cardiology | VOL. 209
Adham H El-Sherbini, et. al.Adham H El-Sherbini ... Mohammad El-Diasty
21 Oct 2023
The American Journal of Cardiology | VOL. 209

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Learning curves for drug response prediction in cancer cell lines

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics