Abstract

Entity resolution refers to the method of identifying the same real world object from multiple data sets. In Data cleaning and data integration application, entity resolution is an important process. When data is large the task of entity resolution becomes complex and time consuming. End-to-end entity resolution proposal involves stages like blocking (efficiently identifies duplicates), detailed comparison (refines blocking output) and clustering (identifies the set of records which may refer to the same entity). In this paper, an approach for feedback based optimization of complete entity resolution is proposed in which supervised meta-blocking is used for blocking stage. This paper proposes a technique for entity resolution which does optimization of each phase of entity resolution with benefits of supervised Meta-blocking to improve performance of entity resolution for big data

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call