Abstract

Data provenance is information about where data come from (provenance data) and how they transform (provenance transformation). Data provenance is widely used to evaluate data quality, trace errors, audit data, and understand references among data. Current studies on data provenance in relational database management systems (RDBMS) still have limitations in supporting full-featured SQL or procedural languages. With these challenges in mind, we present a formal definition of provenance data and provenance transformation for relational data. Then, we propose a solution to support data provenance in relational databases, including provenance graphs and provenance routes. Our method not only solves the complicated problem of modeling provenance in DBMS but also is capable of extending procedural languages in SQL. We also present ProvPg, a PostgreSQL-based prototype database system supporting data provenance in multiple granularities. ProvPg implements extraction, calculation, query, and visualization of provenance. We perform TPC-H tests for ProvPg and PostgreSQL, respectively. Experimental results show that ProvPg addresses the vision of supporting data provenance with little extra computation overhead for the execution engine, which indicates that our model is applicable to lineage tracing applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call