Abstract Next-generation sequencing-based genomic profiling is now a mainstay of pediatric oncology research and clinical testing. Correlating genomic features of patient cancer genomes with curated data extracted from large reference cohorts is critical for identifying molecular subtypes and underlying mutagenesis processes. To facilitate such investigation, we developed two user-friendly workflows on St. Jude Cloud, a data sharing ecosystem hosting genomic data for >10,000 pediatric cancer patients and survivors. These workflows leverage St. Jude Cloud comprehensive pediatric cancer genomic data, including 1,616 RNA-seq of 135 cancer subtypes and 958 whole genome sequencing (WGS) of 35 subtypes, to enable user analysis of their data in the context of St. Jude Cloud cohorts without a need to download large datasets. The RNA-Seq Expression Classification workflow enables a user to compare their patient RNA-Seq gene expression data with blood (832), brain (456), and solid tumor (319) pediatric cancer reference cohorts and PDX models (45), enabling subtype classification using t-Distributed Stochastic Neighbor Embedding (t-SNE). Reference cohorts include curated subtype-defining somatic alterations integrating genomic variant data with expression profile. Resulting interactive t-SNE plots can be explored and annotated - with options to highlight cancer subtypes or samples and display sample information (age of onset, clinical diagnosis, molecular driver). To demonstrate, we analyze PAWNXH, a Children's Oncology Group AML sample with a novel ZBTB7A-NUTM1 fusion and find it clusters with AML samples harboring KMT2A re-arrangements suggesting a potential mechanism of pathogenesis. Integrating PDX samples enables model selection for functional experiments by connecting patient subtypes with mouse models. The Mutational Signatures workflow identifies and quantifies COSMIC mutational signatures in user-uploaded somatic VCF files for comparison to reference pediatric cancer cohorts. The interactive interface enables rapid identification of signatures within the query cohort and facilitates comparison to the reference using a cohort-level summary view. Identified signatures may also be explored at the sample-level for both query and reference cohorts, enabling the user to identify samples with signatures of interest for further analysis. We show an example comparison of mutational signatures identified in pediatric and adult AML samples. These workflows enable users to leverage curated pediatric cancer data to make discoveries in their own samples. Enabling point-and-click analysis in St. Jude Cloud removes the barrier for non-computational researchers and eliminates the need to download large reference datasets for local analysis. Both workflows utilize post-processed rather than raw genomic data, reducing transfer costs for uploading user data to the cloud. Citation Format: Andrew Thrasher, Michael Macias, Alexander M. Gout, Delaram Rahbarinia, Xin Zhou, Samuel W. Brady, Clay McLeod, Michael C. Rusch, Xiaolong Chen, Soheil Meshinchi, Michael A. Dyer, Suzanne J. Baker, Martine F. Roussel, Jinghui Zhang. Empowering point-and-click genomic analysis with large pediatric genomic reference data on St. Jude Cloud [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 2289.
Read full abstract