The development of a computational machine learning tool to decipher malignant cell gene expression from complex tumor tissue.

Maksim Chelushkin,Nathan Fowler,Alexander Bagaev,Cagdas Tazearslan,Krystle Nomie,Boris Shpak,Michael F Goldberg,Madison H Chasse,Valentina Beliaeva,Dawn M Fernandez,Aleksandr Zaitsev,Iris Wang

doi:10.1200/jco.2021.39.15_suppl.e15050

Abstract

e15050 Background: Complex tumor tissue is composed of malignant cells and diverse tumor microenvironment (TME) cellular populations. Depending on the cancer type, the percentage of malignant cells present in tumor tissue varies, with percentages sometimes below 10%. TME cellular transcripts may comprise the majority of the total transcripts in a tumor, potentially resulting in biases during biomarker development and for clinical decision-making. However, TME gene expression can be subtracted from total tumor gene expression, resulting in only malignant cell expression. Methods: A computational tool was developed for the “subtraction” of TME-specific gene expression from total gene expression in an array of solid tumors, producing in silico “purified'' malignant cell gene expression. To extract the malignant cell RNA expression values, a machine learning model was trained on an artificial transcriptome dataset created from different solid tumor cancer cell lines with the addition of various TME cellular proportions. The artificial transcriptomes, both training and test, were composed of 7,114 samples of purified TME cell types and 3,143 cancer cell lines. The final model relied on the following three major parameters: 1) RNA percentages of deconvolved TME cell types; 2) the weighted sum of the average expression of a malignant cell target gene produced by different TME cell populations where the RNA percentages (weights in the weighted sum) of the TME cell populations were predicted by deconvolution; and 3) a set of genes expressed predominantly in the TME. To experimentally validate the computational model, different proportions of COLO829 (cutaneous melanoma), MCF-7 (invasive ductal carcinoma), and K562 (chronic myeloid leukemia) cell lines were mixed in vitro with PBMCs, creating controlled representations of tumor tissue, and RNA was extracted and sequenced. Gene expression was quantified and analyzed with the computational model, with comparisons to pure cancer cell line expression. Results: The expression of at least 120 clinically relevant biomarkers was reconstructed by applying the model to artificial transcriptomes in which the percentage of malignant cells varied from 10% to 90%. The concordance correlation coefficient between pure cancer cell lines and the extracted malignant cell expression increased on average from 0.75 to 0.9 compared to unprocessed data (e.g., PTEN from 0.2 to 0.88, RB1 from 0.57 to 0.89, ERBB2 from 0.83 to 0.99). In vitro validation showed that the tool improved the concordance correlation coefficient and mean absolute error (MAE) for many tumor biomarkers. For example, the PTEN correlation coefficient increased by 0.33 and its MAE was reduced 3-fold. Conclusions: This novel computational tool can aid in treatment decision-making based on malignant cell expression, promoting the use of gene expression for personalized therapeutics.

Full Text