Abstract

Abstract Gene expression profiling is widely used in oncology research and in clinical settings for decision making. Despite the cross-platform correlation of gene expression values, ideally, each measurement should be evaluated against a cohort of samples sequenced using the same methodology. Clinical samples, preserved as FFPEs, often undergo exome capture-based RNA-seq; research samples, stored as fresh/frozen (FF), undergo poly-A RNA-seq, producing high quality expression data. Thus, development of sequencing protocols and data processing algorithms are necessary to provide the same quality gene expression measurements from FFPE samples. Further, while several batch effect correction algorithms exist to neutralize the batch effect between samples across large cohorts, the majority cannot be applied to an individual sample, raising the need to develop an algorithm for single sample projection to improve gene expression-based personalized clinical decision-making. To improve the quality of RNA reads from FFPE tissues, exome capture enrichment of RNA transcripts was optimized and the concordance with poly-A RNA-seq was increased by adding non coding 3’ and 5’ UTR region to the probes. After testing the performance of multiple different extraction methods, a 0.88 correlation was achieved between exome-capture-based and poly-A RNA-seq protocols. To further align the sequencing methodologies, we designed a batch-correction ML-based algorithm by performing a series of paired RNA-seq experiments from the same sample using exome-capture-based and poly-A RNA-seq; we applied linear modeling on the training subset (N = 64) and verified the performance on the validation subset (N = 24). For each gene, 5-20 correlated genes belonging to the TCGA combined pan-cancer datasets were selected and trained using the Lasso model. Over 82% of genes (total N = 20,062) correlated across the two RNA-seq methodologies for each sample after correction (ccc value > 0.5), and approximately 94% of cancer-specific and microenvironment-related genes correlated (ccc value > 0.5). The algorithm significantly outperformed other batch correction methods, with ccc values > 0.8 for 51.37% of the 20,062 genes compared with ~3% for PCA, 26% for MNN, and 28% for ComBat. Our algorithm showed improved performance by correction of 77% of the 1,890 clinically-relevant genes (ccc values > 0.8) compared with 15% for PCA, 39% for MNN, and 40% for ComBat. Here, we developed combinatory technology with a batch correction algorithm trained and developed on FFPE or FF tumor samples, using exome capture-based sequencing or poly-A RNA-seq, that enables the projection of a single sample onto a larger cohort. Future application of this correction tool will enable direct analysis of gene expression of single tumor samples to support potential gene expression-based treatment decisions. Citation Format: Nikita Kotlov, Kirill Shaposhnikov, Cagdas Tazearslan, Ilya Cheremushkin, Madison Chasse, Artur Baisangurov, Svetlana Podsvirova, Svetlana Korkova, Yaroslav Lozinsky, Katerina Nuzhdina, Elena Vasileva, Dmitry Kravchenko, Krystle Nomie, John Curran, Nathan Fowler, Alexander Bagaev. Combinatory technologies for single sample gene expression projection onto a cohort sequenced with a different technology for personalized clinical decision-making [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 1216.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call