Abstract

BackgroundThe development of clinical -omic biomarkers for predicting patient prognosis has mostly focused on multi-gene models. However, several studies have described significant weaknesses of multi-gene biomarkers. Indeed, some high-profile reports have even indicated that multi-gene biomarkers fail to consistently outperform simple single-gene ones. Given the continual improvements in -omics technologies and the availability of larger, better-powered datasets, we revisited this “single-gene hypothesis” using new techniques and datasets.ResultsBy deeply sampling the population of available gene sets, we compare the intrinsic properties of single-gene biomarkers to multi-gene biomarkers in twelve different partitions of a large breast cancer meta-dataset. We show that simple multi-gene models consistently outperformed single-gene biomarkers in all twelve partitions. We found 270 multi-gene biomarkers (one per ~11,111 sampled) that always made better predictions than the best single-gene model.ConclusionsThe single-gene hypothesis for breast cancer does not appear to retain its validity in the face of improved statistical models, lower-noise genomic technology and better-powered patient cohorts. These results highlight that it is critical to revisit older hypotheses in the light of newer techniques and datasets.

Highlights

  • The development of clinical -omic biomarkers for predicting patient prognosis has mostly focused on multi-gene models

  • A biomarker generally consists of two parts: a gene set chosen for association with prognosis using a supervised or unsupervised feature selection algorithm and a risk score model that transforms the mRNA abundance levels from these genes into risk scores for a given patient cohort

  • This is concordant with a previous finding that large numbers of non-overlapping gene sets are associated with breast cancer prognosis [12, 13]

Read more

Summary

Introduction

The development of clinical -omic biomarkers for predicting patient prognosis has mostly focused on multi-gene models. A biomarker generally consists of two parts: a gene set chosen for association with prognosis using a supervised or unsupervised feature selection algorithm and a risk score model that transforms the mRNA abundance levels from these genes into risk scores for a given patient cohort. It has been demonstrated that gene sets selected for prognostic ability using such methods often fail to outperform randomly chosen gene sets of the same size [10, 11] This is concordant with a previous finding that large numbers of non-overlapping gene sets are associated with breast cancer prognosis [12, 13]. It is apparent that the Grzadkowski et al BMC Bioinformatics (2018) 19:400 fundamental properties of multi-gene biomarkers must be fully elucidated in order to identify optimal biomarkers, which is difficult to do even when the feature-size is pre-set

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.