Abstract

BackgroundRecent highly publicized cases of premature patient assignment into clinical trials, resulting from non-reproducible omics analyses, have prompted many to call for a more thorough examination of translational omics and highlighted the critical need for transparency and reproducibility to ensure patient safety. The use of workflow platforms such as Galaxy and Taverna have greatly enhanced the use, transparency and reproducibility of omics analysis pipelines in the research domain and would be an invaluable tool in a clinical setting. However, the use of these workflow platforms requires deep domain expertise that, particularly within the multi-disciplinary fields of translational and clinical omics, may not always be present in a clinical setting. This lack of domain expertise may put patient safety at risk and make these workflow platforms difficult to operationalize in a clinical setting. In contrast, semantic workflows are a different class of workflow platform where resultant workflow runs are transparent, reproducible, and semantically validated. Through semantic enforcement of all datasets, analyses and user-defined rules/constraints, users are guided through each workflow run, enhancing analytical validity and patient safety.MethodsTo evaluate the effectiveness of semantic workflows within translational and clinical omics, we have implemented a clinical omics pipeline for annotating DNA sequence variants identified through next generation sequencing using the Workflow Instance Generation and Specialization (WINGS) semantic workflow platform.ResultsWe found that the implementation and execution of our clinical omics pipeline in a semantic workflow helped us to meet the requirements for enhanced transparency, reproducibility and analytical validity recommended for clinical omics. We further found that many features of the WINGS platform were particularly primed to help support the critical needs of clinical omics analyses.ConclusionsThis is the first implementation and execution of a clinical omics pipeline using semantic workflows. Evaluation of this implementation provides guidance for their use in both translational and clinical settings.Electronic supplementary materialThe online version of this article (doi:10.1186/s13073-015-0202-y) contains supplementary material, which is available to authorized users.

Highlights

  • Recent highly publicized cases of premature patient assignment into clinical trials, resulting from non-reproducible omics analyses, have prompted many to call for a more thorough examination of translational omics and highlighted the critical need for transparency and reproducibility to ensure patient safety

  • The National Institute of Standards and Technology (NIST), in conjunction with the Genome in a Bottle Consortium, recently published a set of high-confidence, genome-wide single-nucleotide polymorphism (SNP), indel and genotype calls, based on a genome sequence that they have established as a DNA reference material and made freely available to be used as a truth table in the benchmarking of bioinformatics methods for identifying DNA variants from sequenced genomes [15]

  • All identified DNA sequence variants are annotated with the following information: 1) potential effect on the resultant protein(s); 2) annotation within the Catalogue of Somatic Mutations in Cancer (COSMIC) database [29]; and 3) annotation within the Single Nucleotide Polymorphism database [30]

Read more

Summary

Introduction

Recent highly publicized cases of premature patient assignment into clinical trials, resulting from non-reproducible omics analyses, have prompted many to call for a more thorough examination of translational omics and highlighted the critical need for transparency and reproducibility to ensure patient safety. On the heels of the US Food and Drug Administration’s (FDA) approval of the first next-generation sequencing instrument [16], their recent public workshop on generation sequencing standards highlighted the critical need for the quality assurance of computational biology pipelines [17] Towards these efforts, the National Institute of Standards and Technology (NIST), in conjunction with the Genome in a Bottle Consortium, recently published a set of high-confidence, genome-wide single-nucleotide polymorphism (SNP), indel and genotype calls, based on a genome sequence that they have established as a DNA reference material and made freely available to be used as a truth table in the benchmarking of bioinformatics methods for identifying DNA variants from sequenced genomes [15]. This is exemplified by a recent study in which over 1500 person hours were dedicated to the ‘forensic omics’ task of deciphering the exact data sets used and determining how the data were processed for assignment of patients to clinical trials [19]

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call