Abstract

Despite the widening range of high-throughput platforms and exponential growth of generated data volume, the validation of biomarkers discovered from large-scale data remains a challenging field. In order to tackle cancer heterogeneity and comply with the data dimensionality, a number of network and pathway approaches were invented but rarely systematically applied to this task. We propose a new method, called NEAmarker, for finding sensitive and robust biomarkers at the pathway level. scores from network enrichment analysis transform the original space of altered genes into a lower-dimensional space of pathways. These dimensions are then correlated with phenotype variables. The method was first tested using in vitro data from three anti-cancer drug screens and then on clinical data of The Cancer Genome Atlas. It proved superior to the single-gene and alternative enrichment analyses in terms of (1) universal applicability to different data types with a possibility of cross-platform integration, (2) consistency of the discovered correlates between independent drug screens, and (3) ability to explain differential survival of treated patients. Our new screen of anti-cancer compounds validated the performance of multivariate models of drug sensitivity. The previously proposed methods of enrichment analysis could achieve comparable levels of performance in certain tests. However, only our method could discover predictors of both in vitro response and patient survival given administration of the same drug.

Highlights

  • The problem known as the “dimensionality curse”1,2 - when a set of few biomedical samples are described with a much larger number of molecular variables - undermines robustness of phenotype predictors

  • An experimental or clinical sample should be characterized by a set of altered genes (AGS), such as top ranking differentially expressed genes, or a set of somatic mutations, or a combination of these

  • In overrepresentation analysis (ORA), enrichment is measured by the number of genes shared between the functional gene sets (FGS) and AGS, normalized by the gene set sizes

Read more

Summary

Introduction

The problem known as the “dimensionality curse”1,2 - when a set of few (tens to hundreds) biomedical samples are described with a much larger number of molecular variables - undermines robustness of phenotype predictors. Oncologists expected reports on patient-specific alterations in the light of knowledge available from computerized support systems[18] In our view, these challenges could be most systematically addressed by summarizing sparse, disparate events at the pathway level via the global interaction network. One winning strategy was to employ multigenic expression patterns Such ‘meta-genes’[20] were, despite the seemingly ‘network-free’ definition, nothing other than modules in a co-expression network, which allowed dimensionality reduction and a biological generalization. Another DREAM project revealed efficiency of summarizing gene expression in cancer cell lines over pathways[21]. Further sample classification in a flow of new patients should not require re-running the analysis on the whole cohort, i.e. recalculating the data space, as is often the case

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call