A Rule Management System for Knowledge Based Data Cleaning

Louardi Bradji,Mahmoud Boufaida

doi:10.4236/iim.2011.36028

Abstract

In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantages of our system are threefold. First, it aims at proposing a strong and unified rule form based on first order structure that permits the representation and management of all the types of rules and their quality via some characteristics. Second, it leads to increase the quality of rules which conditions the quality of data cleaning. Third, it uses an appropriate knowledge acquisition process, which is the weakest task in the current rule and knowledge based systems. As several research works have shown that data cleaning is rather driven by domain knowledge than by data, we have identified and analyzed the properties that distinguish knowledge and rules from data for better determining the most components of the proposed system. In order to illustrate our system, we also present a first experiment with a case study at health sector where we demonstrate how the system is useful for the improvement of data quality. The autonomy, extensibility and platform-independency of the proposed rule management system facilitate its incorporation in any system that is interested in data quality management.

Highlights

Data quality (DQ) has always been an important issue and is even more the case today
The RDBC approaches proposed for both academic research and practical applications have certain persistent limitations related to the following aspects of rule design: No practical methodology for Rule Based Systems (RBS) is acceptable for Rule-Based approaches for DC (RBDC) systems because these methodologies are available only for rule production and don’t ensuring the quality of rule
The development of an appropriate RBS for Data Cleaning (DC) is a crucial issue for the final success of RBDC where the rule representation should be of satisfactory expressive power in order to express all of the required rules and be easy to handle and manage rule and its quality

Summary

Introduction

Data quality (DQ) has always been an important issue and is even more the case today. The research works look at the role of Data Cleaning (DC) tools in helping improve DQ and clarify the need to take an enterprisewide approach to DQ management, which is increasingly complex, open and dynamic [1,2]. There is a wide variety of DC tools Their functionality can be classified as follows: Declarative DC and Rule-Based approaches for DC (RBDC). The Rule Based Systems (RBS) that encode knowledge as rules and used to process complicated tasks have been firmly established for many years, they have not been well formally and adequately addressed for the DC tasks. As our objective is to enhance the DQ by applying a Rule based approach in DC, it is necessary to represent some works related to Knowledge Based System, Rule Based System and Rules-Based Data Cleaning. Knowledge could be obtained from domain experts, raw data, documents, personal knowledge, business models and/or learning by experience [12,13].

Objectives

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Intelligent Information Management	Publication Date: Jan 1, 2011
Citations: 29	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Rule Management System for Knowledge Based Data Cleaning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Intelligent Information Management

Lead the way for us

Similar Papers

Multiple Data Quality Evaluation and Data Cleaning on Imprecise Temporal Data
Xiaoou Ding
-
Xiaoou DingXiaoou Ding
01 Jan 2018
01 Jan 2018

Implementation of Data Cleansing Pattern Module for Data Quality Management Application using Open Source Tools
Haidar Alvinanda Sulistyo ... Tien Fabrianti Kusumasari
-
Haidar Alvinanda Sulistyo, et. al.Haidar Alvinanda Sulistyo ... Tien Fabrianti Kusumasari
15 Sep 2020
15 Sep 2020

Addition of Process Decomposition in Open Source Tools-Based Cleansing Data Modules
Dita Aprillia Rahmani ... Ekky Novriza Alam
-
Dita Aprillia Rahmani, et. al.Dita Aprillia Rahmani ... Ekky Novriza Alam
06 Oct 2021
06 Oct 2021

Overview of data quality challenges in the context of Big Data
Suraj Juddoo
-
Suraj JuddooSuraj Juddoo
01 Dec 2015
01 Dec 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Rule Management System for Knowledge Based Data Cleaning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Intelligent Information Management