In this article, the authors present an effort to recognize handwritten Gurumukhi place names for use in postal automation. Five feature extraction techniques (zoning, horizontal peak extent, vertical peak extent, diagonal, and centroid) have been analyzed and optimized using Principal Component Analysis (PCA). Four classification methods ( k -Nearest Neighbor ( k -NN), decision tree, random forest, and Convolutional Neural Network (CNN)) have been utilized to classify the handwritten word images. To enhance the recognition results, the authors have employed Bootstrap Aggregation (Bagging) with a majority voting scheme. The authors used a public benchmark dataset of 40,000 handwritten place-name samples in the Punjabi language for their experimental work. The experiments were conducted using a 70:30 partitioning approach, where 70% of the data was utilized for training and the remaining 30% for testing. The system achieved a maximum recognition accuracy of 96.98% by utilizing a combination of zoning, vertical peak extent, and diagonal features, and a minimum Mean Squared Error (MSE) of 0.86% based on a combination of zoning and horizontal peak extent features with a majority voting scheme through ensemble (Bagging) methodology.
Read full abstract