Abstract
• First work on automatically adding dots to Arabic text without dots (i.e., Rasms). • Automatic adding dots using deep recurrent neural networks . • Evaluated on four different text corpora. • CER rates ranging from 2.0% to 5.5% on independent test sets. Arabic letters in their early stages were only shapes ( Rasm ) without dots. Dots were added later to ease reading and reduce ambiguity. Thereafter, diacritics were introduced for phonetic guidance, mainly for nonnative speakers. Many studies have been conducted to automatically diacritize Arabic texts using machine learning techniques . However, to the best of our knowledge, automatically adding dots to Arabic Rasms has not been reported in the literature. In this work, we present the automatic addition of dots to Arabic Rasms using deep recurrent neural networks. Different design choices were explored, including the use of character sequences and word sequences as tokens. The presented techniques were evaluated on four diverse publicly available datasets. Character-level models with stacked BiGRU architecture outperformed all the other architectures with character error rates ranging from 2.0% to 5.5% and dottization error rates ranging from 4.2% to 11.0% on independent test sets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.