Abstract

Data quality is a complex concept defined by various dimensions such as accuracy, currency, completeness, and consistency (Wang & Strong, 1996). Recent research has highlighted the importance of data quality issues in various contexts. In particular, in some specific environments characterized by extensive data replication high quality of data is a strict requirement. Among such environments, this article focuses on Cooperative Information Systems. Cooperative information systems (CISs) are all distributed and heterogeneous information systems that cooperate by sharing information, constraints, and goals (Mylopoulos & Papazoglou, 1997). Quality of data is a necessary requirement for a CIS. Indeed, a system in the CIS will not easily exchange data with another system without knowledge of the quality of data provided by the other system, thus resulting in a reduced cooperation. Also, when the quality of exchanged data is poor, there is a progressive deterioration of the overall data quality in the CIS. On the other hand, the high degree of data replication that characterizes a CIS can be exploited for improving data quality, as different copies of the same data may be compared in order to detect quality problems and possibly solve them. In Scannapieco, Virgillito, Marchetti, Mecella, and Baldoni (2004) and Mecella et al. (2003), the DaQuinCIS architecture is described as an architecture managing data quality in cooperative contexts, in order to avoid the spread of low-quality data and to exploit data replication for the improvement of the overall quality of cooperative data. In this article we will describe the design of a component of our system named as, quality factory. The quality factory has the purpose of evaluating quality of XML data sources of the cooperative system. While the need for such a component had been previously identified, this article first presents the design of the quality factory and proposes an overall methodology to evaluate the quality of XML data sources. Quality values measured by the quality factory are used by the data quality broker. The data quality broker has two main functionalities: 1) quality brokering that allows users to select data in the CIS according to their quality; 2) quality improvement that diffuses best quality copies of data in the CIS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call