Abstract

The amount and diversity of data in the Semantic Web has grown quite. RDF datasets has proportionally more problems than relational datasets due to the way data are published, usually without formal criteria. Entity Resolution isan important issue which is related to a known task of many research communities and it aims at finding all representations that refer to the same entity in different datasets. Yet, it is still an open problem. Blocking methods are used to avoid the quadratic complexity of the brute force approach by clustering entities into blocks and limiting the evaluation of entity specifications to entity pairs within blocks. In the last years only a fewblocking methods were conceived to deal with RDF data and novel blocking techniques are required for dealing with noisy and heterogeneous data in the Web of Data. In this paper we present a blocking scheme, CER-Blocking, which is based on an inverted index structure and that uses different data evidences from a triple, aiming to maximize its effectiveness. To overcomethe problems of data quality or even the very absence thereof, we use two blocking key definitions. This scheme is part of an ER approach which is based on a relational learning algorithm that addresses the problem by statistical approximation. It was empirically evaluated on real and synthetic datasets which are part of consolidated benchmarks found on the literature.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.