Abstract

Annotation Guidelines for Non-native Arabic Text in the Qatar Arabic Language Bank The Qatar Arabic Language Bank (QALB) is a corpus of naturally written unedited Arabic and its manual edited corrections. QALB has about 1.5 million words of text written and post-edited by native speakers. The corpus was the focus of a shared task on automatic spelling correction in the Arabic Natural Language Processing Workshop that was held in conjunction with 2014 Conference on Empirical Methods for Natural Language Processing (EMNLP) in Doha, with nine research teams from around the world competing. In this poster we discuss some of the challenges of extending QALB to include non-native Arabic text. Our overarching goal is to use QALB data to develop components for automatic detection and correction of language errors that can be used to help Standard Arabic learners (native and non-native) improve the quality of the Arabic text they produce. The QALB annotation guidelines have focused on native speaker text. Learners of Arabic as a second language (L2 speakers) typically have to adapt to a different script and a different vocabulary with new grammatical rules. These factors contribute to the propagation of errors made by L2 speakers that are of different nature than those produced by native speakers (L1 speakers), who are mostly affected by their dialects and levels of education and use of standard Arabic. Our extended L2 guidelines build on our L1 guidelines with a focus on the types of errors usually found in the L2 writing style and how to deal with problematic ambiguous cases. Annotated examples are provided in the guidelines to illustrate the various annotation rules and their exceptions. As with the L1 guidelines, the L2 texts should be corrected with a minimum number of edits that produce semantically coherent (accurate) and grammatically correct (fluent) Arabic. The guidelines also devise a priority order for corrections that prefer less intrusive edits starting with inflection, then cliticization, derivation, preposition correction, word choice correction, and finally word insertion. This project is supported by the National Priority Research Program (NPRP grant 4-1058-1-168) of the Qatar National Research Fund (a member of the Qatar Foundation). The statements made herein are solely the responsibility of the authors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.