Performance Analysis of State of the Art Convolutional Neural Network Architectures in Bangla Handwritten Character Recognition

Tapotosh Ghosh Tapotosh Ghosh,Nasirul Mumenin,Mohammad Abu Yousuf,Hasan Al Banna,Min-Ha-Zul Abedin

doi:10.1134/s1054661821010089

Abstract

Bangla handwritten character recognition is a popular research topic as its difficulty is higher than the recognition of other languages because of multiple formats of compound characters. State of the art Convolutional neural network (CNN) architectures are very much useful in computer vision applications. Some works have been carried out in Bangla handwritten character recognition but most of them either not very efficient or they can not classify a lot of characters. In this work, state of art pre-trained CNN architectures is used to classify 231 different Bangla handwritten characters using CMATERdb dataset. The images were first converted to B&W form with white as the foreground color. The size of the images is reduced to 28 × 28 form. These images are used as input to the CNN architectures. The weights of the state-of-the-art CNN models are kept as it was. The training learning rate was set to 0.001 and categorical cross-entropy as the error function. After 50 epochs, InceptionResNetV2 achieved the best accuracy (96.99%). DenseNet121 and InceptionNetV3 also provided remarkable recognition accuracy (96.55 and 96.20%, respectively). We also considered combination of trained InceptionResNetV2, InceptionNetV3 and DenseNet121 architectures which provided better recognition accuracy (97.69%) than other single CNN architectures but it is not feasible for using as it requires a lot of computation power and memory. The models were tested in the cases where characters look confusing to humans, but all the architectures showed equal capability in recognizing these images. Considering computational complexity, memory and capability of recognizing confused characters, InceptionResNetV2 can be said as the best performing model.

Full Text