Improving Arabic information retrieval using word embedding similarities

Abdelkader El Mahdaouy,Saïd Ouatik El Alaoui,Eric Gaussier

doi:10.1007/s10772-018-9492-y

Abstract

Term mismatch is a common limitation of traditional information retrieval (IR) models where relevance scores are estimated based on exact matching of documents and queries. Typically, good IR model should consider distinct but semantically similar words in the matching process. In this paper, we propose a method to incorporate word embedding (WE) semantic similarities into existing probabilistic IR models for Arabic in order to deal with term mismatch. Experiments are performed on the standard Arabic TREC collection using three neural word embedding models. The results show that extending the existing IR models improves significantly baseline bag-of-words models. Although the proposed extensions significantly outperform their baseline bag-of-words, the difference between the evaluated neural word embedding models is not statistically significant. Moreover, the overall comparison results show that our extensions significantly improve the Arabic WordNet based semantic indexing approach and three recent WE-based IR language models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving Arabic information retrieval using word embedding similarities

Abstract

Talk to us

Similar Papers

More From: International Journal of Speech Technology

Lead the way for us

Journal: International Journal of Speech Technology	Publication Date: Jan 19, 2018
Citations: 30

Similar Papers

Semantically enhanced term frequency based on word embeddings for Arabic information retrieval
Abdelkader El Mahdaouy ... Said Ouatik El Alaoui
-
Abdelkader El Mahdaouy, et. al.Abdelkader El Mahdaouy ... Said Ouatik El Alaoui
01 Oct 2016
01 Oct 2016

Neural generative models and representation learning for information retrieval
Qingyao Ai
ACM SIGIR Forum | VOL. 53
Qingyao AiQingyao Ai
01 Dec 2019
ACM SIGIR Forum | VOL. 53

Information Retrieval: Concepts, Models, and Systems
Venkat N Gudivada ... Dhana L Rao
-
Venkat N Gudivada, et. al.Venkat N Gudivada ... Dhana L Rao
01 Jan 2018
01 Jan 2018

Neural sentence embedding models for semantic similarity estimation in the biomedical domain
Kathrin Blagec ... Asan Agibetov
BMC Bioinformatics | VOL. 20
Kathrin Blagec, et. al.Kathrin Blagec ... Asan Agibetov
11 Apr 2019
BMC Bioinformatics | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving Arabic information retrieval using word embedding similarities

Abstract

Talk to us

Similar Papers

More From: International Journal of Speech Technology