Abstract

Character segmentation plays an important role in the Arabic optical character recognition (OCR) system, because the letters incorrectly segmented perform to unrecognized character. Accuracy of character recognition depends mainly on the segmentation algorithm used. The domain of off-line handwriting in the Arabic script presents unique technical challenges and has been addressed more recently than other domains. Many different segmentation algorithms for off-line Arabic handwriting recognition have been proposed and applied to various types of word images. This paper provides modify segmentation algorithm based on bounding box to improve segmentation accuracy using two main stages: preprocessing stage and segmentation stage. In preprocessing stage, used a set of methods such as noise removal, binarization, skew correction, thinning and slant correction, which retains shape of the character. In segmentation stage, the modify bounding box algorithm is done. In this algorithm a distance analysis use on bounding boxes of two connected components (CCs): main (CCs), auxiliary (CCs). The modified algorithm is presented and taking place according to three cases. Cut points also determined using structural features for segmentation character. The modified bounding box algorithm has been successfully tested on 450 word images of Arabic handwritten words. The results were very promising, indicating the efficiency of the suggested approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.