Denoising distant supervision for ontology lexicalization using semantic similarity measures

Mehdi Jabalameli,Mohammadali Nematbakhsh,Reza Ramezani

doi:10.1016/j.eswa.2021.114922

Abstract

Ontology lexicalization aims to provide information about how the elements of an ontology are verbalized in a given language. Most ontology lexicalization techniques require labeled training data, which are usually generated automatically using the distant supervision technique. This technique is based upon the assumption that if a sentence contains two entities of a triple in a knowledge base, it expresses the relation stated in that triple. This assumption is very simplistic and would lead to generating wrong mappings between sentences and knowledge base triples. In other words, a sentence may contain two entities of a triple, but the relation of entities in the sentence differs from the relation of the triple. Such wrong mappings cause to generating wrong ontology lexicon entries. In this paper, a new method, called denoising distant supervision, is presented to reduce the wrong mappings between sentences and triples by taking the semantic similarity between sentences and the label of triples’ predicate into account. For this purpose, different semantic similarity measures are proposed, which use pre-trained word embeddings to calculate the semantic similarity between the sentences and the label of the triples relation. Then, the sentences whose semantic similarity is low are removed from the mapping. The proposed solution is evaluated in the M−ATOLL framework. The experimental results show that the quality of the generated ontology lexicon under the proposed solution is improved compared to state-of-the-art techniques.

Full Text