A Convolutional Autoencoder based Keyword Spotting in Historical Handwritten Devanagari Documents

Sushma S N,Sharada B

doi:10.1109/icict54344.2022.9850900

Abstract

There exist huge amount of valuable historical Devanagari documents archived in many national libraries that need to be preserved in digital form. Additionally, there is a growing requirement in the area of document image processing for automation and information extraction from old handwritten documents. Retrieval of relevant information from handwritten historical document images would ideally necessitate an efficient An alternate approach to accurate document transcription is to use a keyword spotting method. The keyword spotting system's primary application is to create digital libraries that facilitate quick searching and browsing of old manuscripts in order to preserve the world's cultural heritage. For word spotting in handwritten manuscripts, we propose a convolution neural network (CNN) architecture based on Autoencoder representation. The proposed technique was demonstrated using historical handwritten Devanagari manuscript which is collected from Oriental Research Institute of University of Mysore, Mysore. As a result, the proposed convolution neural network (CNN) method exhibits superior accuracy with favorable results.

Full Text