Abstract

Data Cleaning is a very important part of the data warehouse management process. It is not a very easy process as many different types of unclean data (bad data, incomplete data, typos, etc) can be present. Also, whether a data is clean or dirty is highly dependent on the nature and source of the raw data. Many attempts have been made to clean the data using blocking algorithms, phonetic algorithms, etc. In this paper an attempt has been made to provide a hybrid approach HADCLEAN for cleaning data which combines modified versions of PNRS and Transitive closure algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call