Abstract

Abstract Laboratories conducting high volumes of RNA sequencing must be extremely wary of technical batch effects if samples are to be compared across extended time periods, which is imperative for the most well-powered analyses of cancer transcriptomes. Changes in reagents, protocols, or technologies used in nucleic acid extraction, library preparation, and sequencing can alter transcriptomes in ways that invalidate or complicate comparisons of samples from different batches, necessitating continuous monitoring. This monitoring can be particularly difficult when analyzing samples from distinct tissue sites as tumor type is the major biological determinant of transcriptome variance in cancer. Brain and liver cancer transcriptomes, for example, are expected to differ so drastically that their comparison is not informative for batch effect detection. Detection methods must also be robust to disparate batch effects that can manifest as minor changes in expression among many genes or major changes in a subset of genes making ad hoc detection unfeasible. To overcome these challenges, we developed MaCoBED (matched cohort batch effect detection), a novel method that evaluates technical batch effects in a set of transcriptome samples (e.g., a flow cell) by pooling them with a set of validated reference samples matched by cancer type and tissue site. This pooled set of transcriptomes is then subjected to low-dimensional embedding using Uniform Manifold Approximation and Projection (UMAP), and each component is tested for deviation from the reference set using a Wilcox test. Matching new and legacy samples by cancer type and tissue site ensures that any differences in UMAP clustering are not driven by known biological contributions. We found that UMAP was preferable to Principal Components Analysis (PCA). UMAP can capture variability in just two dimensions, accentuating modest but consistent transcriptome differences among batches that would otherwise be manifested among multiple minor principal components, making batch effects more obvious and readily detectable. This approach was able to detect a number of simulated batch effects with high specificity and sensitivity relative to randomly sampled validated legacy samples. Thus, we propose MaCoBED as a simple and rapid approach for batch effect monitoring of high-throughput RNA sequencing datasets that is versatile in detecting distinct kinds of batch effects, easily automatable, readily interpretable upon visualization, and extensible to small or large batch sizes. Citation Format: Joshua Drews, Joshua Bell, Wesley Munson, Saksham Saini, Benjamin Leibowitz, Jackson Michuda, Calvin McCarter, Lee Langer, Catherine Igartua, Kevin White. Robust detection of sequencing batch effects in RNA through low dimensional embedding with subtype-matched reference samples [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 5466.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.