Abstract

BackgroundBiomarkers are a key component of precision medicine. However, full clinical integration of biomarkers has been met with challenges, partly attributed to analytical difficulties. It has been shown that biomarker reproducibility is susceptible to data preprocessing approaches. Here, we systematically evaluated machine-learning ensembles of preprocessing methods as a general strategy to improve biomarker performance for prediction of survival from early breast cancer.ResultsWe risk stratified breast cancer patients into either low-risk or high-risk groups based on four published hypoxia signatures (Buffa, Winter, Hu, and Sorensen), using 24 different preprocessing approaches for microarray normalization. The 24 binary risk profiles determined for each hypoxia signature were combined using a random forest to evaluate the efficacy of a preprocessing ensemble classifier. We demonstrate that the best way of merging preprocessing methods varies from signature to signature, and that there is likely no ‘best’ preprocessing pipeline that is universal across datasets, highlighting the need to evaluate ensembles of preprocessing algorithms. Further, we developed novel signatures for each preprocessing method and the risk classifications from each were incorporated in a meta-random forest model. Interestingly, the classification of these biomarkers and its ensemble show striking consistency, demonstrating that similar intrinsic biological information are being faithfully represented. As such, these classification patterns further confirm that there is a subset of patients whose prognosis is consistently challenging to predict.ConclusionsPerformance of different prognostic signatures varies with pre-processing method. A simple classifier by unanimous voting of classifications is a reliable way of improving on single preprocessing methods. Future signatures will likely require integration of intrinsic and extrinsic clinico-pathological variables to better predict disease-related outcomes.

Highlights

  • Cancer is fundamentally a disease driven by genetic alterations, with the stepwise accumulation of mutational hits in oncogenes and tumor suppressors [1]

  • We risk stratified breast cancer patients into either low-risk or high-risk groups based on four published hypoxia signatures (Buffa, Winter, Hu, and Sorensen), using 24 different preprocessing approaches for microarray normalization

  • Performance of different prognostic signatures varies with pre-processing method

Read more

Summary

Introduction

Cancer is fundamentally a disease driven by genetic alterations, with the stepwise accumulation of mutational hits in oncogenes and tumor suppressors [1]. The molecular landscape of tumours can vary wildly, leading to differences in progression and overall prognosis These differences are described as genetic heterogeneity, while intra-tumor heterogeneity refers to heterogeneity within a tumor [3,4,5,6]. Treatment decisions for individual patients are largely based on tumor subtype, histology and pathology; clinico-pathological correlation; and tumor size, nodal and metastatic status (TNM stage), along with a few molecular characteristics. This approach does not account for the wide spectrum of genetic burden experienced by the individual patients, leading to divergent responses to therapy that are currently unpredictable. We systematically evaluated machine-learning ensembles of preprocessing methods as a general strategy to improve biomarker performance for prediction of survival from early breast cancer

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call