Linking Datasets Using Semantic Textual Similarity

John P. McCrae,Paul Buitelaar

doi:10.2478/cait-2018-0010

Linking Datasets Using Semantic Textual Similarity

John P. McCrae, Paul Buitelaar

Open Access

https://doi.org/10.2478/cait-2018-0010

Copy DOI

Journal: Cybernetics and Information Technologies	Publication Date: Mar 1, 2018
Citations: 16	License type: CC BY-NC-ND 3.0

Affiliation: Ollscoil na Gaillimhe – University of Galway

#Semantic Textual Similarity #String Similarity + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Abstract Linked data has been widely recognized as an important paradigm for representing data and one of the most important aspects of supporting its use is discovery of links between datasets. For many datasets, there is a significant amount of textual information in the form of labels, descriptions and documentation about the elements of the dataset and the fundament of a precise linking is in the application of semantic textual similarity to link these datasets. However, most linking tools so far rely on only simple string similarity metrics such as Jaccard scores. We present an evaluation of some metrics that have performed well in recent semantic textual similarity evaluations and apply these to linking existing datasets.

Full Text