Abstract

In this work, we used the PROV-DM model to manage data provenance in workflows of genome projects. This provenance model allows the storage of details of one workflow execution, e.g., raw and produced data and computational tools, their versions and parameters. Using this model, biologists can access details of one particular execution of a workflow, compare results produced by different executions, and plan new experiments more efficiently. In addition to this, a provenance simulator was created, which facilitates the inclusion of provenance data of one genome project workflow execution. Finally, we discuss one case study, which aims to identify genes involved in specific metabolic pathways of Bacillus cereus, as well as to compare this isolate with other phylogenetic related bacteria from the Bacillus group. B. cereus is an extremophilic bacteria, collected in warm water in the Midwestern Region of Brazil, its DNA samples having been sequenced with an NGS machine.

Highlights

  • The speed and efficiency with which scientific workflows may be performed have increased with the use of modern hardware and software technologies

  • The PROV-DM model allows to store the properties of each execution of a bioinformatics workflow

  • The proposed provenance model was divided in two levels, one corresponding to the provenance graph itself and the other providing access to the data of a particular execution

Read more

Summary

Introduction

The speed and efficiency with which scientific workflows may be performed have increased with the use of modern hardware and software technologies. This way, different WasGeneratedBy relationships can be created for only one activity, only one activity can occur for each collection or entity This definition models the processes executed in bioinformatics projects, where one particular data (or file) can only be generated by a single program. The Provenance simulator is shown, where the case study Multiple Alignment is displayed using information of an XML file previously stored To use this simulator, the user needs to create a project, enter provenance data (Table 1), and inform the graph nodes (Agent, Activity and Entity) and relations linking these nodes (Figure 4). Annotations indicating the origin of some of the collections and programs used for the activity executions

Conclusion
Tan WC
31. Gomes LSA
35. UNIPROT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call