Hybrid Arabic handwritten character segmentation using CNN and graph theory algorithm

Lamia Berriche,Ashjan Alqahtani,Siwar Rekikr

doi:10.1016/j.jksuci.2023.101872

Abstract

Arabic handwritten words segmentation is considered a challenging step in Arabic optical character recognition. In this work, we propose a convolutional neural network-based segmentation of handwritten Arabic words. After generating candidate segmentation points, we applied a graph theory-based technique to divide the word into sub-words. Our approach reached a 96% correct sub-word segmentation rate. Additionally, we develop a Segmentation Hypothesis Graph (SHG) to generate candidate characters for each sub-word. We employ both a manually designed convolutional neural network and a transfer learning-based CNN using the pretrained AlexNet model. The accuracy, 96.97%, of the pretrained model outperformed the manually and state of the art models trained on the same dataset. Finally, we used recognition confidence to validate the segmentation points. Hence, our approach correctly segments 88% of the words into characters and successfully resolves 100% of the overlapping cases.

Full Text