Abstract

Abstract The exponential growth and diversity of complex datasets poses severe challenges in terms of data access and sharing, analysis and compute power. The Cancer Genomics Cloud (CGC), powered by Seven Bridges, is a NCI-funded resource that provides a unified platform for cancer data analysis by co-localizing three components within the cloud: 1) large cancer datasets from The Cancer Genome Atlas (TCGA), Clinical Proteomic Tumor Analysis Consortium (CPTAC) and several others; 2) >400 bioinformatics tools and best-practice workflows for analyzing multi-omics data; and 3) the computational capabilities to do large-scale analyses. The user-friendly portal of CGC allows the researchers to browse, query and filter datasets of interest and also bring their own data for collaborative analysis in the context of other publicly available data. In addition to the simplicity of data access and management, the CGC provides the flexibility to bring private tools, and the ability to complete reproducible and interactive analyses, all with the speed of cloud computing resources without needing any cloud provider accounts or managed billing. The platform has been continuously improved to include new datasets, applications and features since its launch in 2016, to broaden the data and workflows accessible to cancer researchers. We have enabled true multi-omic analysis by integrating with new data nodes within the Cancer Research Data Commons, which facilitate easier access to perform analysis on proteomics, canine, and others, alongside with the genomics data on the CGC. We have also expanded our infrastructure to run computations on the location where the data lives, thereby simplifying the user experience. Currently, analysis is supported on data held in both Google and Amazon cloud environments. Using the power of Connected Cloud Storage, datasets residing in the AWS Registry of Open Data such as gnomAD or 1000 genomes can be easily attached as volumes and these volumes are treated as any other file repository. Interactive analysis of data can be performed using RStudio, along with Jupyter and Julia notebooks and is tailored to maximize user experience (including billing controls, flexibility, etc). With a keen focus on interoperability, the CGC has implemented services to support the technical standards including DRS, WES and TRS recommended by the Global Alliance for Genomics and Health (GA4GH). In summary, the CGC connects researchers to diverse datasets from a wide range of sources distributed across multiple clouds. The CGC empowers the users to complete their entire workflow on the platform, while streamlining collaboration and speeding the time from hypothesis to conclusion. Altogether, these added features enable a network of findable, accessible, interoperable and reusable (FAIR) datasets, workflows, and services towards making cancer data analysis faster, and more easily available for all. Citation Format: Sai Lakshmi Subramanian, Manisha Ray, Jack DiGiovanna, Jelena Radenkovic, Marko Tosic, Nikola Mirkovic, Milos Stanojevic, Milos Trboljevac, Vladan Andjus, Ana Stelkic, Brandi Davis-Dusenbery. The Cancer Genomics Cloud: A secure and scalable cloud-based platform to access, share and analyze multi-omics datasets [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 253.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call