Abstract

Data quality issues have been topical for many decades. However, a unified data quality theory has not been proposed yet, since many concepts associated with the term “data quality” are not straightforward enough. The paper proposes a user-oriented data quality theory based on clearly defined concepts. The concepts are defined by using three groups of domain-specific languages (DSLs): (1) the first group uses the concept of a data object to describe the data to be analysed, (2) the second group describes the data quality requirements, and (3) the third group describes the process of data quality evaluation. The proposed idea proved to be simple enough, but at the same time very effective in identifying data defects, despite the different structures of data sets and the complexity of data. Approbation of the approach demonstrated several advantages: (a) a graphical data quality model allows defining of data quality even by non-IT and non-data quality professionals, (b) data quality model is not related to the information system that has accumulated data, i.e., this approach lets users analyse the “third-party” data, and (c) data quality can be described at least at two levels of abstraction - informally, using natural language, or formally, including executable program routines or SQL statements.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call