Abstract

PURPOSEInstitutional efforts toward the democratization of cloud-scale data and analysis methods for cancer genomics are proceeding rapidly. As part of this effort, we bridge two major bioinformatic initiatives: the Global Alliance for Genomics and Health (GA4GH) and Bioconductor.METHODSWe describe in detail a use case in pancancer transcriptomics conducted by blending implementations of the GA4GH Workflow Execution Services and Tool Registry Service concepts with the Bioconductor curatedTCGAData and BiocOncoTK packages.RESULTSWe carried out the analysis with a formally archived workflow and container at dockstore.org and a workspace and notebook at app.terra.bio. The analysis identified relationships between microsatellite instability and biomarkers of immune dysregulation at a finer level of granularity than previously reported. Our use of standard approaches to containerization and workflow programming allows this analysis to be replicated and extended.CONCLUSIONExperimental use of dockstore.org and app.terra.bio in concert with Bioconductor enabled novel statistical analysis of large genomic projects without the need for local supercomputing resources but involved challenges related to container design, script archiving, and unit testing. Best practices and cost/benefit metrics for the management and analysis of globally federated genomic data and annotation are evolving. The creation and execution of use cases like the one reported here will be helpful in the development and comparison of approaches to federated data/analysis systems in cancer genomics.

Highlights

  • Stated, the computational initiatives of the Global Alliance for Genomics and Health (GA4GH) concern improvements in the efficiency of data management and analysis at a global level.[1]

  • We describe in detail a use case in pancancer transcriptomics conducted by blending implementations of the GA4GH Workflow Execution Services and Tool Registry Service concepts with the Bioconductor curatedTCGAData and BiocOncoTK packages

  • We examine an approach to combining the GA4GH Tool Registry Services (TRS) and Workflow Execution Services (WES) concepts, as implemented in dockstore.org[2] and the Broad Institute Cromwell workflow execution engine, with data, annotation, and software resources and practices developed in the Bioconductor project.[3,4]

Read more

Summary

Introduction

The computational initiatives of the Global Alliance for Genomics and Health (GA4GH) concern improvements in the efficiency of data management and analysis at a global level.[1]. We examine an approach to combining the GA4GH Tool Registry Services (TRS) and Workflow Execution Services (WES) concepts, as implemented in dockstore.org[2] and the Broad Institute Cromwell workflow execution engine, with data, annotation, and software resources and practices developed in the Bioconductor project.[3,4] The goal of an agile bioinformatic resource ecosystem requires principles of resource distribution and management that are coming into focus as new resources are brought to bear on problems of increasing size and importance In this context, resources include data, annotation, software, documentation, analysis environments, and architectural materials related to overall system function, security, and evolution.

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call