A new dataset of word-level offline handwritten numeral images from four official Indic scripts and its benchmarking using image transform fusion

Sk Md Obaidullah,Kaushik Roy,Nibaran Das,Chayan Halder

doi:10.1504/ijiei.2016.074497

Abstract

Handwritten document image dataset development is one of the most tedious and time consuming tasks in optical character recogniser OCR related experimental work. Special attention need to be given in terms of feasibility, realness, clarity etc. while collecting real life data from different writers. Few efforts can be found in the literature for development of handwritten NIdb numeral image dataset but they were restricted on single script which is a local script of the fellow researcher who prepared the database. In this paper, an approach to develop word-level handwritten NIdb of four popular Indic scripts namely Bangla, Devanagari, Roman and Urdu has been proposed. Benchmark result is developed with respect to handwritten numeral script identification HNSI problem by applying a novel image transform fusion ITF based technique. The proposed dataset will be freely available to the researchers for non-commercial use.

Full Text