Abstract The Proteomic Data Commons (PDC) hosts cancer proteomics data with the goal of making this data available to the public to support development of cancer diagnostics, treatment, and progression tracking. As a part of the Cancer Research Data Commons (CRDC), the Terra platform provides a cloud workbench for the PDC data. FireCloud is a Broad Institute project funded by NCI to empower cancer researchers to access data, run analysis tools and collaborate securely in the cloud. It is powered by Terra, a secure, scalable cloud-native platform developed by the Broad Institute, Microsoft, and Verily, an Alphabet company. It provides batch workflow execution, interactive analysis including data visualization, and ~2,900 publicly available tools with the ability to import more tools from Dockstore. The integration of PDC and Terra enables researchers to leverage the data navigation and file-level search capabilities on the PDC web browser (pdc.cancer.gov) and export selected data manifests to a Terra workspace. This integrated hand-off specifically allows metadata and cloud links to PDC data in Portable Format for Bioinformatics (PFB) to be transferred to Terra and used in analysis with cloud resources. A featured workspace for Terra-PDC integration includes tools for (i) downloading data files and relevant additional metadata; (ii) organizing the data and metadata for running the FragPipe workflow, a comprehensive collection of tools for reading and processing raw Mass Spec (MS) data; and (iii) implementing a pre-configured FragPipe workflow to process isobarically labeled MS data from Tandem Mass Tag (TMT) or Isobaric tags for relative and absolute quantitation (iTRAQ) experiments using information automatically imported from PDC via the PFB import. The pipelines can also be customized as needed to process any type of raw MS data the PDC supports. In addition, Terra connects to the NCI Genomic Data Commons (GDC) and hosts The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research to Generate Effective Treatments (TARGET) datasets allowing users to pull in other data types for co-analysis. As a whole, using the CRDC cloud resources, in conjunction with Terra, allows analysis of data on a massive scale, enabling multi-omic integration spanning the genomic and proteomic commons data, with easy sharing of data and tools. Citation Format: Emily LaPlante, Bingxing Huo, DR Mani, Ratna R. Thangudu. Optimizing proteomic data access and analysis in the cloud: Leveraging Terra's integration with the Proteomic Data Commons [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 7420.
Read full abstract