Abstract

Entity resolution (ER) or Entity identification is the process of identifying records referring to the same real world entity. Entity Identification is one of the most important problems in data cleaning and arises in many applications such as information integration and information retrieval. One of the challenges is entity resolution, when integrating data from different sources. As the volume of data on the web or in databases increases, data integration is becoming more expensive and challenging than ever before. For example, different persons may have identical name or other characteristics. So it is necessary to identify such complex records referring to same real world entity. Traditional entity identification approaches obtain a result using similarity comparison among records, assuming that records referring to the same entity are more similar to each other. However, such property may not hold so traditional ER approaches can't identify records correctly in some cases. The proposed framework develops a class of ER rules which are used for entity identification capable to identify the complex matching conditions between records and entities. By incorporating decision tree concept into the rule generation algorithm the proposed framework outperforms the traditional method. In the proposed method, by applying rules to each record, it is possible to identify which entity the record refers to.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call