Abstract

This paper describes the design and implementation of the Data Quality Query System (DQ2S), a query processing framework and tool incorporating data quality profiling functionality in the processing of queries involving quality-aware query language extensions. DQ2S supports the combination of performance and quality-oriented query optimizations, and a query processing platform that enables advanced data profiling queries to be formulated based on well established query language constructs, often used to interact with relational database management systems. DQ2S encompasses a declarative query language and a data model that provides users with the capability to express constraints on the quality of query results as well as query quality-related information; a set of algebraic operators for manipulating data quality-related information, and optimization heuristics. The proposed query language and algebra represent seamless extensions to SQL and relational database engines, respectively. The constructs of the proposed data model are implemented at the user’s view level and are internally mapped into relational model constructs. The quality-aware extensions and features are extremely useful when users need to assess the quality of relational data sets and define quality constraints for acceptable data prior to using candidate data sources in decision support systems and conducting big data analytical tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call