Abstract
Abstract While whole-genome (WGS), whole-exome (WES), and RNA-Seq data of patient samples are key resources for the development of precision medicine, major computing infrastructure is typically required to use them effectively. The St Jude Cloud (SJCloud, https://stjude.cloud), built in collaboration with DNAnexus and Microsoft, aims to remove this barrier by sharing genomic sequencing data generated at St Jude Children's Research Hospital, making complex bioinformatics pipelines easily accessible, and providing intuitive visualizations for data mining in the cloud. Over 5000 WGS, 6000 WES and 1500 RNA-Seq from >5,000 pediatric cancer patients mapped to the latest reference genome are securely available in SJCloud. These data were generated from three St Jude-funded genomic initiatives: the Pediatric Cancer Genome Project (PCGP), the St Jude Life Genome Project, and the Genomes for Kids Clinical Trial. SJCloud hosts BAM files, coding and non-coding somatic and germline SNVs and indels, copy number (CNV) and structural alterations (SV). Non-identifiable data (e.g. somatic alterations, genotype frequency, cancer diagnosis and demographics) can be viewed immediately using our interactive genome browser, while raw data and individual genotype access requires a simple online approval. Data synchronization and visualization enables novel discoveries by non-bioinformaticians. For example, a genomic view of the TERT locus shows enrichment of CNVs and SVs in neuroblastoma (NBL), consistent with reports of activation via rearrangement. The same view also shows a somatic promoter mutation, C228T, in one NBL; such mutations have not been reported in primary samples to our knowledge. This integrated view across somatic mutation types enables evaluation of the diverse genetic mechanisms deregulating cancer genes. SJCloud also facilitates data re-analysis. We ported the “MutationalPatterns” R package (Blokzijl et al. 2017) to the cloud to elucidate major mutational signatures in >500,000 PCGP WGS somatic variants. Inclusion of non-coding mutations was critical as the low number of exonic mutations in some pediatric cancers is insufficient for robust analysis. A surprising finding was a signature consistent with ultraviolet-induced DNA damage in a subset of B-acute lymphoblastic leukemia. End-to-end workflows to detect gene fusions, predict neoepitopes, classify mutations, process ChIP-seq, and identify differentially expressed genes are also freely accessible. By integrating analytic tools with the world's largest set of pediatric genomics data, SJCloud enables data sharing and mining, innovative genomic analysis, and development of new analytic methods. We anticipate that in 2019 we will host data from over 10,000 pediatric cancer patients, and we are actively exploring approaches to make this a federated data repository capable of interchange with the global pediatric cancer research community. Citation Format: Scott Newman, Xin Zhou, Clay McLeod, Michael Rusch, Gang Wu, Edgar Sioson, Shuoguo Wang, J. Robert Michael, Aman Patel, Michael N. Edmonson, Andrew Frantz, Ti-Cheng Chang, Yongjin Li, Robert I. Davidson, Singer Ma, Irina McGuire, Nedra Robison, Xing Tang, Lance Palmer, Ed Suh, Leigh Tanner, James McMurry, Keith Perry, Zhaoming Wang, Carmen Wilson, Yong Cheng, Mitch Weiss, Leslie L. Robison, Yutaka Yasui, Kim E. Nichols, David W. Ellison, James R. Downing, Jinghui Zhang. Access, visualize and analyze 5,000 whole-genomes from pediatric cancer patients on St. Jude Cloud [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 922.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.