Abstract

In this paper we extended our first experiment on Neural Machine Translation (NMT) based query translation for Amharic-Arabic Cross Language Information Retrieval (CLIR) task to retrieve relevant documents from Amharic and Arabic text collections in response to a query expressed in the Amharic language by modifying the ranking algorithm with Parts of speech Tags (POS). We used a pre-trained NMT model, to map a query in the source language into an equivalent query in the language of the target document collection. The relevant documents are then retrieved using a Language Modeling (LM) based retrieval algorithm by substituting lambda with POS based LM. The experimental result is compared with four conventional IR models, namely Uni-gram and Bi-gram LM, Probabilistic model and Vector Space Model (VSM). The proposed POS based LM ranking algorithm outperform all others for both Amharic and Arabic language document collections.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call