I-Boost: an integrative boosting approach for predicting survival time with multiple genomics platforms

Kin Yau Wong,Donglin Zeng,Joel S Parker,Charles M Perou,Maki Tanioka,Andrew B Nobel,Cheng Fan,Dan-Yu Lin

doi:10.1186/s13059-019-1640-4

Kin Yau Wong, Donglin Zeng + Show 6 more

Open Access

https://doi.org/10.1186/s13059-019-1640-4

Copy DOI

Abstract

We propose a statistical boosting method, termed I-Boost, to integrate multiple types of high-dimensional genomics data with clinical data for predicting survival time. I-Boost provides substantially higher prediction accuracy than existing methods. By applying I-Boost to The Cancer Genome Atlas, we show that the integration of multiple genomics platforms with clinical variables improves the prediction of survival time over the use of clinical variables alone; gene expression values are typically more prognostic of survival time than other genomics data types; and gene modules/signatures are at least as prognostic as the collection of individual gene expression data.

Highlights

Prediction of disease outcomes, such as individual patient survival time, is critically important for cancer patients
Data We evaluated the performance of the methods using three The Cancer Genome Atlas (TCGA) data sets, namely the lung adenocarcinoma (LUAD) data set, the kidney renal clear cell cancer (KIRC) data set, and a pan-cancer data set derived from ∼ 1400 patients that represents eight different tumor types considered by Hoadley et al [26]; see the “Methods” section for a detailed description of the data sets and the evaluation procedure
For the LUAD data set, only a few models that contain both clinical and genomic variables provide better prediction than the model with clinical variables only. These results indicate that in certain cancer types, genomic variables contribute to survival prediction in the presence of clinical variables, and the magnitude of the contribution can be large

Summary

Introduction

Prediction of disease outcomes, such as individual patient survival time, is critically important for cancer patients. Traditional prognostic models that rely solely on clinical variables, such as age and tumor stage, fail to account for the molecular heterogeneity of tumors and may lead to suboptimal treatment decisions [1]. To remedy this situation, many studies have incorporated gene expression data in survival prediction [2,3,4,5]. Large-scale genomics projects such as The Cancer Genome Atlas (TCGA) have generated detailed molecular data on patients with a variety of cancer types. The availability of multiple data types has enabled researchers to address a variety of important questions. Patients can be more precisely classified into molecular subtypes based on

Objectives

Methods

Results

Conclusion