Investigation of model stacking for drug sensitivity prediction

Kevin Matlock,Ranadip Pal,Carlos De Niz,Raziur Rahman,Souparno Ghosh

doi:10.1186/s12859-018-2060-2

Kevin Matlock, Ranadip Pal + Show 3 more

Open Access

https://doi.org/10.1186/s12859-018-2060-2

Copy DOI

Journal: BMC bioinformatics	Publication Date: Mar 1, 2018
Citations: 39	License type: open-access

Affiliation: Texas Tech University

Abstract

BackgroundA significant problem in precision medicine is the prediction of drug sensitivity for individual cancer cell lines. Predictive models such as Random Forests have shown promising performance while predicting from individual genomic features such as gene expressions. However, accessibility of various other forms of data types including information on multiple tested drugs necessitates the examination of designing predictive models incorporating the various data types.ResultsWe explore the predictive performance of model stacking and the effect of stacking on the predictive bias and squared error. In addition we discuss the analytical underpinnings supporting the advantages of stacking in reducing squared error and inherent bias of random forests in prediction of outliers. The framework is tested on a setup including gene expression, drug target, physical properties and drug response information for a set of drugs and cell lines.ConclusionThe performance of individual and stacked models are compared. We note that stacking models built on two heterogeneous datasets provide superior performance to stacking different models built on the same dataset. It is also noted that stacking provides a noticeable reduction in the bias of our predictors when the dominant eigenvalue of the principle axis of variation in the residuals is significantly higher than the remaining eigenvalues.

Highlights

A significant problem in precision medicine is the prediction of drug sensitivity for individual cancer cell lines
We examine the stacking of predictive models and their influence on prediction accuracy and modeling bias
We explored the theoretical underpinnings of the stacking operation on mean squared error and how stacking will produce results that are no worse than the worst individual model

Summary

Introduction

A significant problem in precision medicine is the prediction of drug sensitivity for individual cancer cell lines. Predictive models such as Random Forests have shown promising performance while predicting from individual genomic features such as gene expressions. We examine the stacking of predictive models and their influence on prediction accuracy and modeling bias. The principal individual model considered in this article is Random Forests (RF) since previously reported studies [1,2,3,4] have shown RF to outperform multiple other approaches in drug sensitivity prediction applications. To demonstrate the role of stacking in accuracy and bias reduction, we created a drug sensitivity prediction setup with multiple data sources.

Objectives

Methods

Results

Discussion

Conclusion