Abstract

Data quality will be a significant issue as data warehousing becomes more and more popular. This paper aims at investigating and analyzing the data quality issues in data warehouse environments. We present an attribute‐based metadata model for identifying data quality. A four‐phase process is introduced for data quality management during the life cycle of data warehouses. Overall data quality conditions can be identified and related information can be provided for determining whether the data meet “fit to use” criteria and whether they need to be improved. Furthermore, we use a cost/benefit evaluation model to ferret out the poor‐quality data and set priorities for improvement given limited resources. Our approach allows system developers to document relevant quality data as metadata, which may be associated with the whole life cycle of data warehouses. Quality metadata not only can enrich the interpretation of attribute data, but can also provide diagnostic information for finding the reasons for and the sources of errors. In addition, the cost/benefit evaluation model developed may provide a foundation for the quantitative analysis of data quality.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call