Abstract
The widespread use of colloquial dialects among the younger generation of Arabs is depriving many of them the fruits of information freedom. Although most Arabs have no problem with reading text in formal Arabic, widely known as Modern Standard Arabic (MSA), the younger generation is more adept at colloquial Arabic, mainly owing to the widespread use of social media. The current search engines cater mostly to MSA. This means that materials written in colloquial are off-limits to those who use MSA, and similarly the MSA contents are off-limits for those who communicate in colloquial only. To achieve the full potential of an information-retrieval system, we need a successful scheme that interprets queries whether they are in MSA, colloquial Arabic or a combination of both. In this paper we design an information-retrieval system that addresses our concern against the backdrop of one of the local dialects in Saudi Arabia. Our system is based on modifying an MSA stemming technique and a set of colloquial ↔ MSA conversion rules that are lexicon based. We tested the system using 44 queries on a corpus of over 1400 documents (MSA, colloquial, mix). The average precision was 84.3%, while the average recall was 96.5%. In the second test we compared the precision of the retrieved documents by our system vs Google and Yahoo! search engines. The respective average precisions were 78.2, 51.9 and 56.2%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.