AProvBio: An architecture for data provenance in bioinformatics workflows using graph database

Rodrigo Almeida,Maristela Holanda,Aleteia Araujo,Sergio Lifschitz,Klayton Castro,Maria Emilia Walter,Waldeyr Da Silva

doi:10.1109/bibm.2017.8217989

Rodrigo Almeida, Maristela Holanda + Show 5 more

https://doi.org/10.1109/bibm.2017.8217989

Copy DOI

Export

Save

Cite

Publication Date: Nov 1, 2017

Citations: 2

Affiliation: Universidade de Brasília

Abstract
Full-Text
Similar Papers

Abstract

Listen

Many scientific experiments in Bioinformatics are executed as computational workflows. Frequently, it is necessary to re-run an experiment under the original circumstances in which it was run to recognize and validate it. Data provenance concerns the origin of data. Knowing the data source facilitates the understanding and analysis of the results, by detailing and documenting the history and the paths of the input data, from the beginning to the end of an experiment. Therefore, in this context, data provenance can be applied when experimenting traceability. This document presents AProvBio, an architecture that can perform the data provenance of scientific experiments in bioinformatics automatically, using the provenance data model PROV-DM and in a graph database. The architecture can perform the automatic provenance type prospectively, retrospectively and with user-defined data. Thus, the architecture stores and captures information obtained during the execution of the data generation processes with user-defined data information, such as features and versions of the programs used. A graph model, based on the PROV-DM model, was proposed for storing the data provenance. The PROV-DM can be represented by a graph, it allows for a more natural modelling, as well as expressing queries at a more natural level, and the implementation of efficient algorithms to perform specific operations.

Full Text