Abstract

Information sources on the Web are controlled by different text formats, and have varying inconsistencies. Data form many online sources do not contain enough information to accurately link the records. To link record from different data sources, any system must identify common entities from these sources. Therefore, the major challenges in record linkage are computational complexity and linkage accuracy. To reduce the number of record pairs for comparison, record linkage utilizes similarity search techniques in order to search for candidate similar records. Various searching methods have been used in record linkage systems. In this paper, we propose a record linkage framework and also focus on standardization and enhance the searching method by adopting an advanced feature of cluster-based searching method called Hierarchical Bayesian Clustering (HBC), which is not only for more efficient record pair comparison, but also for speeding up the record linkage accuracy. The purpose of this method is to place similar records into cluster that restricts the search scope for record comparison and also enhances matching accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call