Abstract

The BioCompute Object (BCO) standard is an IEEE standard (IEEE 2791-2020) designed to facilitate the communication of next-generation sequencing data analysis with applications across academia, government agencies, and industry. For example, the Food and Drug Administration (FDA) supports the standard for regulatory submissions and includes the standard in their Data Standards Catalog for the submission of HTS data. We created the BCO App to facilitate BCO generation in a range of computational environments and, in part, to participate in the Advanced Track of the precisionFDA BioCompute Object App-a-thon. The application facilitates the generation of BCOs from both workflow metadata provided as plaintext and from workflow contents written in the Common Workflow Language. The application can also access and ingest task execution results from the Cancer Genomics Cloud (CGC), an NCI funded computational platform. Creating a BCO from a CGC task significantly reduces the time required to generate a BCO on the CGC by auto-populating workflow information fields from CGC workflow and task execution results. The BCO App supports exporting BCOs as JSON or PDF files and publishing BCOs to both the CGC platform and to GitHub repositories.

Highlights

  • The BioCompute Object (BCO) is an IEEE standard (IEEE 2791-2020) titled Bioinformatics Analyses Generated by High-Throughput Sequencing (HTS) to Facilitate Communication[1]

  • Use cases We demonstrate the process of generating a BioCompute Object using the BCO App with an nextgeneration sequencing (NGS) data analysis workflow and its execution results available from the Cancer Genomics Cloud (CGC)

  • We developed the BCO App to facilitate the adoption of the BioCompute standard

Read more

Summary

Introduction

The BioCompute Object (BCO) is an IEEE standard (IEEE 2791-2020) titled Bioinformatics Analyses Generated by High-Throughput Sequencing (HTS) to Facilitate Communication[1]. The BCO, in its simplest form, supports the documentation of workflows through nine domains (provenance, usability, extension, description, execution, parametric, input/output, error, and top-level fields), each with two to twelve fields that specify domain characteristics (i.e., domain fields). The specification aims to further clarify the workflow execution via the input/ output domain and the error domain that defines expected errors. It allows additional information describing the appropriate use of a workflow through the usability and parametric domains. The application accepts plaintext user inputs, workflow contents written in the Common Workflow Language (CWL), and task execution results from the Cancer Genomics Cloud (CGC), an NCI funded computational platform[4] and other similar informatics platforms. An example bioinformatics pipeline for RNA-seq differential expression analysis is used to demonstrate the BCO generation flow

Methods
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call