Abstract

Tamil Stone Inscriptions are ancient handwritten documents engraved on stone that contain a veritable mine of information and traditional knowledge. With reference to the authenticated sources, around 65% Indian inscriptions were found in Tamil language. Amidst many inscriptions character recognition studies that have been published for different languages, no significant effort has been taken for Tamil language. There are few high performances online Handwritten Tamil OCRs for modern Tamil alphabets, but they perform with less than 20% accuracy while dealing stone inscription characters since the alphabets are of ancient form and also the difference between background and foreground is very meager in stone inscriptions. Most of the existing character recognition studies for Handwritten Tamil scripts have relied upon the widely used Hidden Markov Model (HMM), in spite of its familiar shortcomings and few reported in ANN. Though character recognition from the images of inscribed documents is challenging because of the complex character structure of the Tamil language scripts and other artifacts like aging and degradation, modern techniques needs to be developed to digitize such inscriptions and preserves them as electronic documents. In this paper, a novel approach is proposed for recognizing the inscription characters from the ancient Tamil stone inscriptions based on two recently developed models of Convolutional Neural Networks and Recurrent Neural Network (RNN). Camera captured stone inscriptions script images are taken as input and enhanced for clarity through various image enhancement techniques like filtering, luminous, erosion, dilation and blurring. Project profile based character segmentation is done for extracting individual characters out of script image. Over segmentation reduction is done for eliminating touching and broken characters. The character recognition is done in twofold. (i) Recognition of single characters (vowels and consonants) using Convolutional Neural Network (ii) Recognition of compound characters using Recurrent Neural Networks-BLSTM model. A meticulous test on large datasets has been performed to evaluate the performance of the proposed approach. Experimental results show that the proposed CNN based system achieved training accuracy of 88% and validation accuracy of 94% is obtained (Included RNN accuracy). The system performance is evaluated on various test cases in each phase and the limitations have been identified.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call