Abstract 2100: Imputation of drug sensitivity levels in The Cancer Genome Atlas using machine learning identifies novel predictors of chemotherapeutic response

Paul Geeleher,Fan Wang,R Stephanie Huang,Gladys Morrison

doi:10.1158/1538-7445.am2016-2100

Abstract

Abstract The Cancer Genome Atlas (TCGA) represents a landmark undertaking that has collected genomic data from tumors of almost 15,000 cancer patients. Most tumors have been assayed using many high-throughput genomics techniques, such as whole-genome DNA copy number, gene expression, mutation status and DNA methylation. The project has also compiled a large set of matching clinical data; however, the utility of TCGA for pharmacogenomic discovery has been severely limited by the lack of cleanly measured drug response data. The absence of such data owes to the difficulty in obtaining this information from cancer patients, because individuals are typically treated with complex multi-drug regimes, which renders the precise measurement of a drug specific response phenotype intractable. However, precisely measured drug response data can be readily obtained in pre-clinical models, such as cell lines. Thus, in this study, we have implemented a machine learning based approach, where gene expression based predictive models of drug response are constructed on approximately 700 cancer cell lines; these models are then applied to nearly 15,000 TCGA tumor samples, for which gene expression data is also available, yielding a predicted drug sensitivity value in each TCGA sample. We used this approach to impute a drug sensitivity estimate for 138 drugs that were treated against the cell lines. Our approach allowed us to transform TCGA into a vast pharmacogenomics dataset, on an unprecedented scale (and thus power), which could be mined for novel associations relevant to chemotherapeutic response. As a proof-of-concept, we first investigated whether the existing small number of known clinically relevant associations could be recapitulated in these imputed data. As an example, we identified the predicted difference in sensitivity to Lapatinib, which interrupts ERBB2, as significantly greater in ERBB2 amplified tumors (P = 6 × 10-12), an association that is drug specific. In ERBB2 amplified tumors, large chunks of chromosome 17, containing many genes, are typically amplified; this recurring phenomenon often renders it impossible to identify the causative genes in genome wide copy number data using conventional approaches. Strikingly however, by interrogating the observed effect size of genes in this amplicon, ERBB2 could be identified as the precise causative drug target in our data, demonstrating the impressive increase in power obtained from this approach. Additionally, the second most significant association for Lapatinib is for EGFR, which is the known secondary target of this drug (P = 5 × 10-4). We also identified many novel associations; e.g. that a copy number amplification in ERLIN2, a gene shown to play a role in stabilizing microtubules, is associated with resistance to a class of chemotherapeutics that interact with the microtubules. This was subsequently validated with follow-up experiments. Citation Format: Paul Geeleher, R. Stephanie Huang, Fan Wang, Gladys Morrison. Imputation of drug sensitivity levels in The Cancer Genome Atlas using machine learning identifies novel predictors of chemotherapeutic response. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 2100.

Full Text