Semantic Similarity of Arabic Sentences with Word Embeddings

El Moatez Billah Nagoudi,Didier Schwab

doi:10.18653/v1/w17-1303

Abstract

Semantic textual similarity is the basis of countless applications and plays an important role in diverse areas, such as information retrieval, plagiarism detection, information extraction and machine translation. This article proposes an innovative word embedding-based system devoted to calculate the semantic similarity in Arabic sentences. The main idea is to exploit vectors as word representations in a multidi-mensional space in order to capture the semantic and syntactic properties of words. IDF weighting and Part-of-Speech tagging are applied on the examined sentences to support the identification of words that are highly descriptive in each sentence. The performance of our proposed system is confirmed through the Pearson correlation between our assigned semantic similarity scores and human judgments.

Highlights

Text Similarity is an important task in several application fields, such as information retrieval, plagiarism detection, machine translation, topic detection, text classification, text summarization and others
We consider the IDF weighting and Part-of-Speech tagging techniques in order to improve the identification of words that are highly descriptive in each sentence
An alternative technique is the application of the Part-of-Speech tagging (POS tag) for identification of words that are highly descriptive in each input sentence (Schwab, 2005) (Lioma and Blanco, 2009)

Summary

Introduction

Text Similarity is an important task in several application fields, such as information retrieval, plagiarism detection, machine translation, topic detection, text classification, text summarization and others. There are two known types of words similarity: lexical and semantic. The second type aims to quantify the degree to which two words are semantically related. As an example they can be, synonyms, represent the same thing or they are used in the same context. In this article we focus our investigation on measuring the semantic similarity between short Arabic sentences using word embedding representations. The rest of this article is organized as follows, the section describes work related to word representations in vector space.

Word Embedding Models

Model Used

Sentences similarity

Words Similarity

No Weighting Method

IDF Weighting Method

Part-of-speech weighting Method

Preprocessing

Results

Experiments And Results

Conclusion and Future Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semantic Similarity of Arabic Sentences with Word Embeddings

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2017
Citations: 72	License type: cc-by

Similar Papers

Arabic Sentences Semantic Similarity Based on Word Embedding
Badrya Dahy ... Khaled Fathy
-
Badrya Dahy, et. al.Badrya Dahy ... Khaled Fathy
12 Oct 2022
12 Oct 2022

Word Embedding-Based Approaches for Measuring Semantic Similarity of Arabic-English Sentences
El Moatez Billah Nagoudi ... Jérémy Ferrero
-
El Moatez Billah Nagoudi, et. al.El Moatez Billah Nagoudi ... Jérémy Ferrero
01 Jan 2018
01 Jan 2018

Semantic textual similarity for modern standard and dialectal Arabic using transfer learning.
Mansour Al Sulaiman ... Omar A Alzubi
PloS one | VOL. 17
Mansour Al Sulaiman, et. al.Mansour Al Sulaiman ... Omar A Alzubi
11 Aug 2022
PloS one | VOL. 17

ASSD: Arabic Semantic Similarity Dataset
Badrya Dahy ... Mamdouh Farouk
-
Badrya Dahy, et. al.Badrya Dahy ... Mamdouh Farouk
13 Dec 2021
13 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semantic Similarity of Arabic Sentences with Word Embeddings

Abstract

Highlights

Summary

Talk to us

Similar Papers