Segmentation‐based recognition system for handwritten Bangla and Devanagari words using conventional classification and transfer learning

Rahul Pramanik,Soumen Bag

doi:10.1049/iet-ipr.2019.0208

Abstract

Offline recognition of handwritten text in Indian regional scripts is a major area of research as nearly 910 million people use such scripts in India. Most of the reported research works on Indian script-based optical character recognition (OCR) system have focused on a single script only. Research for developing methodologies that are capable of handling more than one Indian script is yet to be focused. As such, this has motivated us to study and experiment on creating a recognition system that can handle two most popular Indian scripts, namely Bangla and Devanagari. The authors propose a system that first detects and corrects skew present in Bangla and Devanagari handwritten words, estimates the headline, and further segments the words into meaningful pseudo-characters. This is followed by extraction of three different statistical features and combination of these features with off-the-shelf classifiers to study and identify the exemplary combination. Moreover, they employ state-of-the-art convolutional neural network-based transfer learning architectures and delineate a comparison with the extracted hand-crafted features. Finally, they amalgamate the identified pseudo-characters to provide the final result. On experimentation, the proposed segmentation methodology is discerned to provide good accuracy when compared with existing methods.

Full Text