Abstract

In recent years, social web users in Arabic countries have been resorting to the dialects as a written language in their social exchanges. Arabic dialects derive from modern standard Arabic (MSA) and differ significantly from one country to another and one region to another. The use of these dialects has led to an increase of interest in the specificities of such informal languages and their automatic processing within the NLP community. In this work, we deal with the Tunisian dialect (TD) in particular. We address the issue of the automatic Latin to Arabic transliteration of TD language productions on the social web and propose an approach that models the transliteration as a sequence labeling task. At a word level, several techniques, based on machine and deep learning, have been tested for this study, using real word messages extracted from social networks. We experiment and compare three transliteration models: A Conditional Random Fields-based model (CRF), a Bidirectional Long Short-Term Memory based model (BLSTM), and a BLSTM based model with CRF decoding (BLSTM-CRF). The obtained results show that BLSTM-CRF, leads to the best performance, reaching 96.78% of correctly transliterated words. We also evaluate the BLSTM-CRF transliteration approach in context on a set of random TD messages extracted from the social web. We obtained a total error rate of 2.7%. 25% of which are context errors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.