Abstract

Today's Arabic Handwriting recognition systems are able to recognize arbitrary words over a large but finite vocabulary. Systems operating with a fixed vocabulary are bound to encounter so-called out-of-vocabulary (OOV) words. The aim of this research is to propose a two-step approach that tackles the problem of OOV words in Arabic handwriting. In the first step, we exploit different types of sub-word units to detect the potential OOVs. In the recovery stage, a dynamic dictionary is built to extend the initial static word lexicon in order to cope with the detected OOVs. The recovery includes a selection step in which the best word candidates extracted from the external resource are kept. Experiments were conducted on the public benchmarking KHATT and AHTID/MW databases. The obtained results revealed that sub-word modeling could give cues for improving the detection and that the use of a dynamic dictionary significantly improves the recognition performance compared to one-step approaches that are based on a large static dictionary or the combination of different sub-word units. We achieve the state of the art results on the KHATT dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.