Abstract

Modern data management has to deal with data from different sources with different quality, therefore, supporting data provenance in the system level and allowing users to know where data comes from and how it was derived have become a critical research topic. Annotation is one of approaches to track provenance. However, storing fine-grained annotations can be expensive as the complete annotations for the data may outsize the storage space required for the data itself. In this paper, we propose a framework for storing provenance information relating to data derived via relational queries, using provenance trees which match the query structure to avoid redundant storage of information about the derivation process. Within this framework, we come up with a series of storage optimization methods against the relational queries to make good choices of query tree nodes where provenance information should be stored. Our optimization algorithms run in time polynomial in the query size and linear in the size of the provenance, thus enabling provenance tracking and optimization without incurring large overheads. This framework is a new idea for the data tracing study and has a wide range of applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.