Abstract

Arabic is the official language overall Arab coun-tries, it is used for official speech, news-papers, public adminis-tration and school. In Parallel, for everyday communication, non-official talks, songs and movies, Arab people use their dialects which are inspired from Standard Arabic and differ from one Arabic country to another. These linguistic phenomenon is called disglossia, a situation in which two distinct varieties of a language are spoken within the same speech community. It is observed Throughout all Arab countries, standard Arabic widely written but not used in everyday conversation, dialect widely spoken in everyday life but almost never written. Thus, in NLP area, a lot of works have been dedicated for written Arabic. In contrast, Arabic dialects at a near time were not studied enough. Interest for them is recent. First work for these dialects began in the last decade for middle-east ones. Dialects of the Maghreb are just beginning to be studied. Compared to written Arabic, dialects are under-resourced languages which suffer from lack of NLP resources despite their large use. We deal in this paper with Arabic Algerian dialect a non-resourced language for which no known resource is available to date. We present a first linguistic study introducing its most important features and we describe the resources that we created from scratch for this dialect.

Highlights

  • Under-resourced languages are languages which lacks resources dedicated for natural language processing

  • Considering the fact that ALG is an Arabic dialect, we adopted the following writing policy: when writing a word in Algiers dialect we look if there is an Arabic word close to this dialect word, if it does exist we adopt the Arabic writing for the dialect word, otherwise the word is written as it is pronounced

  • The verbs having the three first patterns are converted to Algiers dialect pattern by changing diacritic marks to while the verbs corresponding to pattern are kept as they are

Read more

Summary

INTRODUCTION

Under-resourced languages are languages which lacks resources dedicated for natural language processing These languages suffer from unavailability of basic tools like corpora, mono or multilingual dictionaries, morphological and syntactic analyzers, etc. This lack of resources makes working with these languages a great challenge, especially when we deal with unwritten languages like Arabic dialects. Unlike Middle-East Arabic dialects, Algerian Arabic dialects are non-resourced languages, they lack all kinds of NLP resources. This paper is organized as follows: before dealing with Algerian dialect we give in Section II a brief overview of Arabic language, whereas in Section III we present different aspects of ALG. We will conclude by summarizing the main ideas of this work and by giving our future tendencies

ARABIC LANGUAGE
SPECIFICITIES OF ALGIERS DIALECT
Syntactic level
GRAPHEME-TO-PHONEME CONVERSION
Issues of G2P conversion for Algiers dialect
Rule based approach
Statistical Approach
Related works
Adopted Approach
Building the dialect dictionary
Experiment
Results
CONCLUSION
Findings
Definite article 4 Words Case-ending 5 Long vowel rules
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call