Abstract

The increasing amounts of data produced by automated scientific instruments require scalable data management platforms for storing, transforming and analyzing scientific data. At the same time, it is paramount for scientific applications to keep track of the provenance information for quality control purposes and to be able to re-trace workflow steps. Relational database systems are designed to efficiently manage and analyze large data volumes, and modern extensible database systems can also host complex data transformations as stored procedures. However, the relational model does not naturally support data provenance or lineage tracking. In this paper, we focus on providing data provenance management in relational databases for stored procedures. Our approach, called PSP, leverages the XML capabilities of SQL:2003 to keep track of the lineage of the data that has been processed by any stored procedure in a relational database as part of a scientific workflow. We show how this approach can be implemented in a state-of-the-art DBMS and discuss how the captured provenance data can be efficiently queried and analyzed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call