Abstract

Abstract The Cancer Genome Atlas (TCGA)network has generated more than2.5petabytes of genomic dataover the last decade, with petabyte-scale additions of dataexpected inthe coming years. Access and analysis of this informationonalocal compute environment is challenging due tothe volume of dataand lack of sufficient computing resources at many research organizations. The Cancer Genomics Cloud Pilot project from the National Cancer Institute (NCI)has helped indemocratizing access toTCGA by co-localizing datawith computational resources onthe cloud. Funded as part of this project, the SevenBridges Cancer Genomics Cloud (CGC)hosts nearly 5petabytes of public datafrom TCGA, the Simons Genome Diversity Project, the Therapeutically Applicable Research toGenerate Effective Treatments (TARGET)initiative, The Cancer Imaging Archive (TCIA), and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). The CGCprovides academic researchers with asecure, scalable, cloud-based cancer research platform that includes collaborative tools for accessing, uploading, analyzing, and visualizing data. The platform uses resource descriptionframeworks, dataharmonization, and metadatacurationtofacilitate effective querying, and bioinformatics tools are implemented onthe CGCusing the CommonWorkflow Language (CWL), an emerging standard for describing computational workflows, tosupport computational reproducibility. Since its launch in2016, the CGChas enabled researchers from around the world tounderstand humangenetics and cancer biology through the analysis of large public datasets and private datainacloud computing environment. Inthis poster, we present anexample of analysis of TCGAdataonthe CGC. The OptiType tool for HumanLeukocyte Antigenclass I typing was used toprofile 8,872RNA-Seq samples present inthe TCGAdataset. All samples were accurately processed within2days using arobust, fault-tolerant, and cost-efficient CWL descriptionof OptiType that enabled analysis for less than50cents per sample onaverage. This case study demonstrates how cloud computing resources canfacilitate the successful analysis of large cohorts of datausing custom pipelines in a robust, scalable, and reproducible manner. Citation Format: Raunaq Malhotra, Alexandar Krasnitz, Anurag Sethi, Erik Lehnert, Elizabeth H. Williams, Davis-Dusenbery N. Brandi. Low-cost and accurate human leukocyte antigen (HLA) class I typing of The Cancer Genome Atlas on the Seven Bridges Cancer Genomics Cloud [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 2348.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call