Abstract 2255: Using tumor sample gene expression data to infer tumor purity levels with stochastic gradient boosting machines

Yuanyuan Li,Adrienna Bingham,Yuan Zhuang,David M Umbach,Qi-Jing Li,Leping Li

doi:10.1158/1538-7445.am2018-2255

Abstract

Abstract Tumor purity is the percent of cancer cells present in a sample of tumor tissue. The noncancerous cells (stromal cells) in a tumor are thought to have an important role in tumor growth, metastatic progression, and drug resistance. They also strongly influence genomic analyses of tumor samples. The Cancer Genome Atlas (TCGA) has extensive RNA-seq data from tumor tissue samples as well as assessments of tumor purity for the samples. Our goal is to select a subset of genes whose expression levels are predictive of tumor purity for each tumor type as well as a subset of genes whose expression levels are predictive of all tumor type samples when pooled together. We hope that the genes selected may provide insight about the cell-type composition of tumor samples and about the similarities and differences in tumor microenvironments. We use data from the TCGA, which covers 11 different tumor types and includes genome-wide assessments on over 3,148 samples for gene expression. To identify predictive genes, we used XGBoost, a supervised machine learning algorithm based on the idea of a boosted regression tree ensemble. We carried out 100 repeated runs of 10-fold cross-validations (total of 1,000 train-test partitions) for each tumor type and, also, for all tumor types combined. Using the training-set samples, XGBoost selects a set of genes to predict tumor purity levels; the selected genes are subsequently used to predict the purity levels of the test-set samples. Across the 1,000 train-test partitions for all 11 tumor types, the average root-mean-squared error ranged from 0.09 to 0.16 for the test sets. For each tumor type, we selected the top 250 genes based on their aggregated feature importance scores, a measure of each gene's contribution to tumor purity estimation. No single gene was among the top 250 in all 11 tumor types; however, ACAP1, AMICA1, CSF2RB, CYTIP, GGT5, GLIPR1, IRF4, and PECAM1 were not only among the top 250 in more than 6 tumor types but also in the top 250 when all tumors were combined, suggesting those genes might serve as biomarkers for tumor purity. The most common pathways from gene ontology analysis of these top genes include various immune and signaling pathways. We used XGBoost to identify genes whose expression levels were associated with tumor purity levels in each tumor type. Our results suggest that assessed tumor purity levels in tumor samples can be faithfully recapitulated using certain subsets of genes. We believe that those genes selected for each tumor type by our unbiased approach might provide insight into the biology of the tumor microenvironment, e.g., the presence of cell type-specific marker genes would indicate the presence of specific cell types. Citation Format: YuanYuan Li, Adrienna Bingham, Qi-Jing Li, Yuan Zhuang, David M. Umbach, Leping Li. Using tumor sample gene expression data to infer tumor purity levels with stochastic gradient boosting machines [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 2255.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Abstract 2255: Using tumor sample gene expression data to infer tumor purity levels with stochastic gradient boosting machines

Abstract

Talk to us

Similar Papers

More From: Cancer Research

Lead the way for us

Journal: Cancer Research	Publication Date: Jul 1, 2018
Citations: 1

Similar Papers

Differential Allele-Specific Expression Uncovers Breast Cancer Genes Dysregulated by Cis Noncoding Mutations.
Pawel F Przytycki ... Mona Singh
Cell Systems | VOL. 10
Pawel F Przytycki, et. al.Pawel F Przytycki ... Mona Singh
01 Feb 2020
Cell Systems | VOL. 10

Putative biomarkers for predicting tumor sample purity based on gene expression data
Yuanyuan Li ... David M Umbach
BMC Genomics | VOL. 20
Yuanyuan Li, et. al.Yuanyuan Li ... David M Umbach
01 Dec 2019
BMC Genomics | VOL. 20

Abstract 5107: Putative biomarkers for tumor sample purity prediction based on gene expression data
Yuanyuan Li
Cancer Research | VOL. 79
Yuanyuan LiYuanyuan Li
01 Jul 2019
Abstract 5107: Putative biomarkers for tumor sample purity prediction based on gene expression data
Yuanyuan Li

Identification of the Most Sensitive and Robust Immunohistochemical Markers in Different Categories of Ovarian Sex Cord-stromal Tumors
Chengquan Zhao ... Ross Barner
American Journal of Surgical Pathology | VOL. 33
Chengquan Zhao, et. al.Chengquan Zhao ... Ross Barner
01 Mar 2009
American Journal of Surgical Pathology | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Abstract 2255: Using tumor sample gene expression data to infer tumor purity levels with stochastic gradient boosting machines

Abstract

Talk to us

Similar Papers

More From: Cancer Research