Modern information retrieval in Arabic – catering to standard and colloquial Arabic users

Aqil M Azmi,Eman A Aljafari

doi:10.1177/0165551515585720

Abstract

The widespread use of colloquial dialects among the younger generation of Arabs is depriving many of them the fruits of information freedom. Although most Arabs have no problem with reading text in formal Arabic, widely known as Modern Standard Arabic (MSA), the younger generation is more adept at colloquial Arabic, mainly owing to the widespread use of social media. The current search engines cater mostly to MSA. This means that materials written in colloquial are off-limits to those who use MSA, and similarly the MSA contents are off-limits for those who communicate in colloquial only. To achieve the full potential of an information-retrieval system, we need a successful scheme that interprets queries whether they are in MSA, colloquial Arabic or a combination of both. In this paper we design an information-retrieval system that addresses our concern against the backdrop of one of the local dialects in Saudi Arabia. Our system is based on modifying an MSA stemming technique and a set of colloquial ↔ MSA conversion rules that are lexicon based. We tested the system using 44 queries on a corpus of over 1400 documents (MSA, colloquial, mix). The average precision was 84.3%, while the average recall was 96.5%. In the second test we compared the precision of the retrieved documents by our system vs Google and Yahoo! search engines. The respective average precisions were 78.2, 51.9 and 56.2%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Modern information retrieval in Arabic – catering to standard and colloquial Arabic users

Abstract

Talk to us

Similar Papers

More From: Journal of Information Science

Lead the way for us

Journal: Journal of Information Science	Publication Date: May 18, 2015
Citations: 12

Similar Papers

Universal web accessibility and the challenge to integrate informal Arabic users: a case study
Aqil M Azmi ... Eman A Aljafari
Universal Access in the Information Society | VOL. 17
Aqil M Azmi, et. al.Aqil M Azmi ... Eman A Aljafari
03 Feb 2017
Universal Access in the Information Society | VOL. 17

Classifying and Segmenting Classical and Modern Standard Arabic using Minimum Cross-Entropy
Ibrahim S ... William J
International Journal of Advanced Computer Science and Applications | VOL. 8
Ibrahim S, et. al.Ibrahim S ... William J
01 Jan 2017
International Journal of Advanced Computer Science and Applications | VOL. 8

A Lexical Distance Study of Arabic Dialects
Kathrein Abu Kwaik ... Simon Dobnik
Procedia Computer Science | VOL. 142
Kathrein Abu Kwaik, et. al.Kathrein Abu Kwaik ... Simon Dobnik
01 Jan 2018
Procedia Computer Science | VOL. 142

Automatic Arabic Dialect Classification Using Deep Learning Models
Leena Lulu ... Ashraf Elnagar
Procedia Computer Science | VOL. 142
Leena Lulu, et. al.Leena Lulu ... Ashraf Elnagar
01 Jan 2018
Procedia Computer Science | VOL. 142

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Modern information retrieval in Arabic – catering to standard and colloquial Arabic users

Abstract

Talk to us

Similar Papers

More From: Journal of Information Science