With the continuous development of semantic web, especially of the web of data, several knowledge bases expressed by ontologies are independently created and added to the Linked Open Data (LOD) cloud, on a daily basis. A major challenge for the LOD paradigm is to discover resources that refer to the same real-world object, in order to interlink web resources and hold large scale data integration and sharing. In this context, instance matching is a promising solution. It aims to link co-referent instances belonging to heterogeneous knowledge bases with owl:sameAs links. Several state-of-the-art existing approaches addressing this issue are based on the prior schema-level matchings, which does not avoid the limitation of heterogeneity at the property-level. In this paper, we propose a schema-free, scalable and efficient instance matching approach that is independent from matching results at the schema-level. We transform the instance matching problem to a document similarity problem and we solve it by a Clustering technique that uses an Ascendant Hierarchical Clustering algorithm to group similar instances in the same clusters. Furthermore, we design multiple validating patterns that use some structural information to validate obtained mappings and eliminate wrong ones. Experiments on instance matching track from Ontology Alignment Evaluation Initiative (OAEI) show that our approach gets prominent results compared to several participating systems in OAEI’2019, OAEI’2020 and OAEI’2021.
Read full abstract