Abstract

Accurate assessment of the association between continuous variables such as gene expression and survival is a critical aspect of precision medicine. In this report, we provide a review of some of the available survival analysis and validation tools by referencing published studies that have utilized these tools. We have identified pitfalls associated with the assumptions inherent in those applications that have the potential to impact scientific research through their potential bias. In order to overcome these pitfalls, we have developed a novel method that enables the logrank test method to handle continuous variables that comprehensively evaluates survival association with derived aggregate statistics. This is accomplished by exhaustively considering all the cutpoints across the full expression gradient. Direct side-by-side comparisons, global ROC analysis, and evaluation of the ability to capture relevant biological themes based on current understanding of RAS biology all demonstrated that the new method shows better consistency between multiple datasets of the same disease, better reproducibility and robustness, and better detection power to uncover biological relevance within the selected datasets over the available survival analysis methods on univariate gene expression and penalized linear model-based methods.

Highlights

  • Overview of survival analysis with categorical variablesThe realization of precision/personalized medicine requires quantitation of relevant prognostic biomarkers for each individual that guide their diagnosis and treatment

  • We observed several common pitfalls in many publicly available survival analysis web tools that are used for putative biomarker validation or analysis of the association between survival and gene expression data

  • Given the possible variations in sample handling, tumor heterogeneity, gene expression measurement, or metadata collection, it seems unlikely that even if there exists an association between the gene expression and survival outcome, each cutpoint applied to split the samples into high and low expression groups, would result in a significant difference in survival outcome assessed by the logrank test method

Read more

Summary

Introduction

Overview of survival analysis with categorical variablesThe realization of precision/personalized medicine requires quantitation of relevant prognostic biomarkers for each individual that guide their diagnosis and treatment. Beyond the identification of disease subtypes, one component in the identification of the relevant panel of biomarkers is the association of those variables with patient survival outcomes [1,2,3,4] These biomarkers can be generalized to represent variables that are categorical with a limited number of discrete values. Categorized or discrete covariates including clinical features such as pathological cancer stage or genomic features like gene mutation status can be directly used to classify patients for survival analysis (reviewed in [5]). The rule at each splitting step of the tree construction is that random forest will search all cutting points exhaustively, and choose the best one The advantage of such a procedure is that the method will be able to handle many different types of underlying model effects such as non-monotone effects, or even symmetric structures. Our GradientScanSurv method that considers all cutting points exhaustively but resolves into an aggregate statistic instead of taking the best one as the solution as implemented by the PrognoScan method (see below in the Side-by-side comparison with the PrognoScan method section) will be able to benefit from similar advantages

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call