Identification of Devnagari and Roman Scripts from Multi-script Handwritten Documents

Pawan Kumar Singh,Nibaran Das,Ram Sarkar,Mitanasipuri Mitanasipuri,Subhadip Basu

doi:10.1007/978-3-642-45062-4_70

Abstract

AbstractIn a multilingual country like India it is a common scenario that a handwritten text document may contain more than one script. This causes practical difficulty in digitizing such a document, because the language type of the text should be pre-determined, before feeding it into a suitable Optical Character Recognition (OCR) system. In this paper, an intelligent feature based technique is reported, which automatically identifies the scripts of handwritten words from a document page, written in Devnagari script mixed with Roman script. The word-level script identification is performed by applying Multi layer Perceptron (MLP) based classifier with 39 distinctive features. The technique is tested on 100 handwritten document pages containing both Devnagari and Roman script words and 99.54% of words are identified with their true class.KeywordsScript identificationMulti-script handwritten pagesOptical Character RecognitionConvex-hull featureMLP classifier

Full Text