Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study

Daniel Samaga,Horst Zitzelsberger,Kristian Unger,Anne-Laure Boulesteix,Herbert Braselmann,Roman Hornung,Julia Hess,Claus Belka

doi:10.1186/s13014-020-01543-1

Abstract

BackgroundPrognostic models based on high-dimensional omics data generated from clinical patient samples, such as tumor tissues or biopsies, are increasingly used for prognosis of radio-therapeutic success. The model development process requires two independent discovery and validation data sets. Each of them may contain samples collected in a single center or a collection of samples from multiple centers. Multi-center data tend to be more heterogeneous than single-center data but are less affected by potential site-specific biases. Optimal use of limited data resources for discovery and validation with respect to the expected success of a study requires dispassionate, objective decision-making. In this work, we addressed the impact of the choice of single-center and multi-center data as discovery and validation data sets, and assessed how this impact depends on the three data characteristics signal strength, number of informative features and sample size.MethodsWe set up a simulation study to quantify the predictive performance of a model trained and validated on different combinations of in silico single-center and multi-center data. The standard bioinformatical analysis workflow of batch correction, feature selection and parameter estimation was emulated. For the determination of model quality, four measures were used: false discovery rate, prediction error, chance of successful validation (significant correlation of predicted and true validation data outcome) and model calibration.ResultsIn agreement with literature about generalizability of signatures, prognostic models fitted to multi-center data consistently outperformed their single-center counterparts when the prediction error was the quality criterion of interest. However, for low signal strengths and small sample sizes, single-center discovery sets showed superior performance with respect to false discovery rate and chance of successful validation.ConclusionsWith regard to decision making, this simulation study underlines the importance of study aims being defined precisely a priori. Minimization of the prediction error requires multi-center discovery data, whereas single-center data are preferable with respect to false discovery rate and chance of successful validation when the expected signal or sample size is low. In contrast, the choice of validation data solely affects the quality of the estimator of the prediction error, which was more precise on multi-center validation data.

Highlights

Prognostic models based on high-dimensional omics data generated from clinical patient samples, such as tumor tissues or biopsies, are increasingly used for prognosis of radio-therapeutic success
The need for prognostic factors predicting individual response is great and a lot of research effort has been invested in the past decade to identify molecular prognostic markers from multi-level omics data generated from clinical patient samples
We address decision making regarding the choice of SC or MC data for discovery and validation cohorts for prognostic modeling–as this is often needed in studies for predicting the outcome of radiotherapy from high-dimensional molecular data

Summary

Introduction

Prognostic models based on high-dimensional omics data generated from clinical patient samples, such as tumor tissues or biopsies, are increasingly used for prognosis of radio-therapeutic success. Effectiveness is influenced by a number of factors such as radiation sensitivity, the anatomical borders and immunogenic constitution of the tumor, and its environment [1] The interplay between these factors is complex and prediction of the radiation response and overall clinical performance requires detailed measurement of the underlying molecular state of the tissue. This is increasingly attempted through the use of systemic multi-level omics biology approaches [3, 4]. For locally advanced head and neck cancer and glioblastoma, prognostic gene and miRNA signatures predicting local and distant control or overall survival have been recently identified and are promising markers with the potential to allow substratification of standard-therapy treated patients for alternative treatment strategies [9,10,11]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Radiation Oncology	Publication Date: May 14, 2020
Citations: 5	License type: open-access

R Discovery Prime

R Discovery Prime

Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Radiation Oncology

Lead the way for us

Similar Papers

Single centre versus multi-centre pooled morbidity data in PCNL and the implications for informed consent
A Auer ... S Keoghane
Scandinavian Journal of Urology | VOL. 54
A Auer, et. al.A Auer ... S Keoghane
03 Mar 2020
Scandinavian Journal of Urology | VOL. 54

Radiomics for glioblastoma survival analysis in pre-operative MRI: exploring feature robustness, class boundaries, and machine learning techniques
Yannick Suter ... Waldo Valenzuela
Cancer Imaging | VOL. 20
Yannick Suter, et. al.Yannick Suter ... Waldo Valenzuela
05 Aug 2020
Cancer Imaging | VOL. 20

Abstract 5060: Gene network predicts overall survival in patients with primary lung adenocarcinoma
Yafei Li ... Eric S Edell
Cancer Research | VOL. 71
Yafei Li, et. al.Yafei Li ... Eric S Edell
15 Apr 2011
Abstract 5060: Gene network predicts overall survival in patients with primary lung adenocarcinoma
Yafei Li ... Eric S Edell

Identifying metabolomic profiles of inflammatory diets in postmenopausal women
Fred K Tabung ... Kathryn M Rexrode
Clinical nutrition (Edinburgh, Scotland) | VOL. 39
Fred K Tabung, et. al.Fred K Tabung ... Kathryn M Rexrode
17 Jun 2019
Clinical nutrition (Edinburgh, Scotland) | VOL. 39

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Radiation Oncology