Abstract

Many scientific experiments in Bioinformatics are executed as computational workflows. Frequently, it is necessary to re-run an experiment under the original circumstances in which it was run to recognize and validate it. Data provenance concerns the origin of data. Knowing the data source facilitates the understanding and analysis of the results, by detailing and documenting the history and the paths of the input data, from the beginning to the end of an experiment. Therefore, in this context, data provenance can be applied when experimenting traceability. This document presents AProvBio, an architecture that can perform the data provenance of scientific experiments in bioinformatics automatically, using the provenance data model PROV-DM and in a graph database. The architecture can perform the automatic provenance type prospectively, retrospectively and with user-defined data. Thus, the architecture stores and captures information obtained during the execution of the data generation processes with user-defined data information, such as features and versions of the programs used. A graph model, based on the PROV-DM model, was proposed for storing the data provenance. The PROV-DM can be represented by a graph, it allows for a more natural modelling, as well as expressing queries at a more natural level, and the implementation of efficient algorithms to perform specific operations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call