Abstract

Entity Resolution (ER) is a prerequisite to several Web applications including enhancing semantic searches and information extraction from the Web, strengthening the Web of Data by interlinking entity descriptions from autonomous sources, and supporting reasoning using related ontologies. While designing an ER system, it is assumed that each entity profile consists of an exclusively identified set of attribute-value pairs, each entity profile matches to a solitary real-world object, and two similar profiles are identified, while they co-occur in at least one block. ER is an inherently quadratic problem (i.e., O (n2)), given that every entity must draw a comparison with others. Moreover, existing ER techniques relinquishes to scale for large entity collections, Web data. The most well-known solution for addressing large-scale ER in the literature is blocking, which is an approximate solution where similar entities are grouped into blocks and comparisons are limited to within blocks. The process of entity resolution and the types of entity resolution in relational and Web data are discussed in this paper. Further, the paper reviews the literature on the approaches introduced by former researchers on the entity resolution system. The data integration, block building, and block processing phases, and the challenges involved for designing an efficient ER system are discussed. This paper concludes with the measures required to evaluate entity resolution approaches.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.