Enhanced Amharic-Arabic Cross-Language Information Retrieval System using Part of Speech Tagging

HL Shashirekha,Ibrahim Gashaw

doi:10.1109/icac347590.2019.9036807

Abstract

In this paper we extended our first experiment on Neural Machine Translation (NMT) based query translation for Amharic-Arabic Cross Language Information Retrieval (CLIR) task to retrieve relevant documents from Amharic and Arabic text collections in response to a query expressed in the Amharic language by modifying the ranking algorithm with Parts of speech Tags (POS). We used a pre-trained NMT model, to map a query in the source language into an equivalent query in the language of the target document collection. The relevant documents are then retrieved using a Language Modeling (LM) based retrieval algorithm by substituting lambda with POS based LM. The experimental result is compared with four conventional IR models, namely Uni-gram and Bi-gram LM, Probabilistic model and Vector Space Model (VSM). The proposed POS based LM ranking algorithm outperform all others for both Amharic and Arabic language document collections.

Full Text