Moroccan Data-Driven Spelling Normalization Using Character Neural Embedding

Ridouane Tachicart,Karim Bouzoubaa

doi:10.1142/s2196888821500044

Ridouane Tachicart, Karim Bouzoubaa

Open Access

https://doi.org/10.1142/s2196888821500044

Copy DOI

Journal: Vietnam Journal of Computer Science	Publication Date: Oct 5, 2020
Citations: 5	License type: cc-by

Affiliation: Mohammed V University

Abstract

With the increase of Web use in Morocco today, Internet has become an important source of information. Specifically, across social media, the Moroccan people use several languages in their communication leaving behind unstructured user-generated text (UGT) that presents several opportunities for Natural Language Processing. Among the languages found in this data, Moroccan Arabic (MA) stands with an important content and several features. In this paper, we investigate online written text generated by Moroccan users in social media with an emphasis on Moroccan Arabic. For this purpose, we follow several steps, using some tools such as a language identification system, in order to conduct a deep study of this data. The most interesting findings that have emerged are the use of code-switching, multi-script and low amount of words in the Moroccan UGT. Moreover, we used the investigated data in order to build a new Moroccan language resource. The latter consists in building a Moroccan words orthographic variants lexicon following an unsupervised approach and using character neural embedding. This lexicon can be useful for several NLP tasks such as spelling normalization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Moroccan Data-Driven Spelling Normalization Using Character Neural Embedding

Abstract

Talk to us

Similar Papers

More From: Vietnam Journal of Computer Science

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Moroccan Data-Driven Spelling Normalization Using Character Neural Embedding

Abstract

Talk to us

Similar Papers

More From: Vietnam Journal of Computer Science