Abstract

The widespread use of colloquial dialects among the younger generation of Arabs is depriving many of them the fruits of information freedom. Although most Arabs have no problem with reading text in formal Arabic, widely known as Modern Standard Arabic (MSA), the younger generation is more adept at colloquial Arabic, mainly owing to the widespread use of social media. The current search engines cater mostly to MSA. This means that materials written in colloquial are off-limits to those who use MSA, and similarly the MSA contents are off-limits for those who communicate in colloquial only. To achieve the full potential of an information-retrieval system, we need a successful scheme that interprets queries whether they are in MSA, colloquial Arabic or a combination of both. In this paper we design an information-retrieval system that addresses our concern against the backdrop of one of the local dialects in Saudi Arabia. Our system is based on modifying an MSA stemming technique and a set of colloquial ↔ MSA conversion rules that are lexicon based. We tested the system using 44 queries on a corpus of over 1400 documents (MSA, colloquial, mix). The average precision was 84.3%, while the average recall was 96.5%. In the second test we compared the precision of the retrieved documents by our system vs Google and Yahoo! search engines. The respective average precisions were 78.2, 51.9 and 56.2%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call