Abstract
Arabizi is a form of written Arabic which relies on Latin letters, numerals and punctuation rather than Arabic letters. In literature most of the works are concentrated in the study of Arabic neglecting the study of Arabizi. To conduct automatic translation and sentiment analysis, some approaches tend to handle it like any other language while others use a transliteration phase which converts Arabizi into Arabic script. In this context, the main purpose of this study is to determine the utility of Arabizi transliteration in improving automatic translation and sentiment analysis results. We introduce a rule-based automatic transliteration system. Then we apply this system to transliterate a collection of messages before proceeding to machine translation and sentiment analysis tasks. To evaluate the importance of transliteration on these tasks, we also present the construction of a set of lexical resources, such as: a parallel corpus between Arabizi and Modern Standard Arabic (MSA) constructed manually, a sentiment lexicon built automatically and revised manually, and an annotated sentiment corpus constructed automatically based on the sentiment lexicon. We also apply a set of algorithms and models dedicated to machine translation and sentiment analysis, including a number of shallow and deep classifiers as well as different embedding-based models for feature extraction. The experimental results show a consistent improvement after applying transliteration achieving performance results up to 13.06 for automatic translation using the BLEU score and up to 92% for sentiment classification using the F1-score. This study allows to affirm that transliteration is a key factor in Arabizi handling.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.