Enhancing the sentence similarity measure by semantic and syntactico-semantic knowledge

Wafa Wali,Bilel Gargouri,Abdelmajid Ben Hamadou

doi:10.1007/s40595-016-0080-2

Wafa Wali, Bilel Gargouri + Show 1 more

Open Access

https://doi.org/10.1007/s40595-016-0080-2

Copy DOI

Journal: Vietnam Journal of Computer Science	Publication Date: Sep 16, 2016
Citations: 13	License type: open-access

Affiliation: University of Sfax

Abstract

The measure of sentence similarity is useful in various research fields, such as artificial intelligence, knowledge management, and information retrieval. Several methods have been proposed to measure the sentence similarity based on syntactic and/or semantic knowledge. Most proposals are evaluated on English sentences where the accuracy can decrease when these proposals are applied to other languages. Moreover, the results of these methods are unsatisfactory, as much relevant semantic knowledge, such as semantic class, thematic role and syntactico-semantic knowledge like the semantic predicates, are not taken into account. We must acknowledge that this kind of knowledge is rare in most of the lexical resources. Recently, the International Organization for Standardization (ISO) has published the Lexical Markup Framework (LMF) ISO-24613 norm for the development of lexical resources. This norm provides, for each meaning of a lexical entry, all the semantic and syntactico-semantic knowledge in a fine structure. Profiting from the availability of LMF-standardized dictionaries, we propose, in this paper, a generic method that enhances the measure of sentence similarity by applying semantic and syntactico-semantic knowledge. An experiment was carried out on Arabic, as this language is processed within our research team and an LMF-standardized Arabic dictionary is at hand where the semantic and the syntactico-semantic knowledge are accessible and well structured. Moreover, the experiments yielded better results, showing a high correlation with human ratings.

Highlights

The issue of measuring similarity between sentences is crucial in some research fields, such as knowledge management, information retrieval and artificial intelligence
N xi2 xi 2 n yi2 yi 2 where xi refers to the ith element in the list of human judgements, yi refers to the corresponding ith element in the list of sentence similarity computed by our proposed measure and n is the number of sentence pairs
We proposed a method to extend the previous methods by enhancing the similarity measure between sentences with the semantic and syntactico-semantic knowledge profiting from the Lexical Markup Framework (LMF) standardized dictionaries

Summary

Introduction

The issue of measuring similarity between sentences is crucial in some research fields, such as knowledge management, information retrieval and artificial intelligence. Researchers started with statistical-based methods such as [2] and [14] These methods compute the sentence similarity by calculating the co-occurring words in a string sequence. Other authors proposed the semantic-based methods, such as [12] and [14] These approaches used the semantic nets, like the WordNet, the vector space model and the statistical corpus to compute the semantic similarity between words using different known measures, such as Leacock and Chodorow [11], Wu and Palmer [22] and Jiang and Conrath [8]. These semantic-based methods are limited to computing the sentence similarity based only on semantic similarity between words, whereas the syntactic

Objectives

Methods

Results

Conclusion