Abstract

The quality of data can only be improved by cleaning data prior to loading into the data warehouse as correctness of data is essential for well-informed and reliable decision making. Data warehouse is the only viable solution that can bring that dream into a reality. The quality of the data can only be produced by cleaning data prior to loading into data warehouse. Data Cleaning is a very important process of the data warehouse. It is not a very easy process as many different types of unclean data can be present. So correctness of data is essential for well-informed and reliable decision making. Also, whether a data is clean or dirty is highly dependent on the nature and source of the raw data. Many attempts have been made till now to clean the data using different types of algorithms. In this paper an attempt has been made to provide a hybrid approach for cleaning data which combines modified versions of PNRS, Transitive closure algorithms and Semantic Data Matching algorithm can be applied to the data to get better results in data corrections. General Terms Data Warehouse, Data Cleaning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call