Script Identification Research Articles

In this paper, a line parameter based approach is presented to identify the handwritten scripts written in eight popular scripts. Since Optical Character Recognition (OCR) engines are usually script-dependent, automatic text recognition in multi-script environment requires a pre-processing module that helps identifying the scripts before processing the same through the respective OCR engine. The work becomes more challenging when it deals with handwritten document which is still a less explored research area. In this paper, a line parameter based approach is presented to identify the handwritten scripts written in eight popular scripts namely, Bangla, Devanagari, Gujarati, Gurumukhi, Manipuri, Oriya, Urdu, and Roman. A combination of Hough transform (HT) and Distance transform (DT) is used to extract the directional spatial features based on the line parameter. Experimentations are performed at word-level using multiple classifiers on a dataset of 12000 handwritten word images and Multi Layer Perceptron (MLP) classifier is found to be the best performing classifier showing an identification accuracy of 95.28%. The performance of the present technique is also compared with those of other state-of-the-art script identification methods on the same database. A combination of Hough transform (HT) and Distance transform (DT) is used to extract the directional spatial features based on the line parameter. Experimentation are performed at word-level on a total dataset of 12000 handwritten word images and Multi Layer Perceptron (MLP) classifier is found to be the best performing classifier showing an identification accuracy of 95.28%.

Script identification facilitates many important applications in document/video analysis. This paper investigates a relatively new problem: identifying scripts in natural images. The basic idea is combining deep features and mid-level representations into a globally trainable deep model. Specifically, a set of deep feature maps is firstly extracted by a pre-trained CNN model from the input images, where the local deep features are densely collected. Then, discriminative clustering is performed to learn a set of discriminative patterns based on such local features. A mid-level representation is obtained by encoding the local features based on the learned discriminative patterns (codebook). Finally, the mid-level representations and the deep features are jointly optimized in a deep network. Benefiting from such a fine-grained classification strategy, the optimized deep model, termed Discriminative Convolutional Neural Network (DisCNN), is capable of effectively revealing the subtle differences among the scripts difficult to be distinguished, e.g. Chinese and Japanese. In addition, a large scale dataset containing 16,291 in-the-wild text images in 13 scripts, namely SIW-13, is created for evaluation. Our method is not limited to identifying text images, and performs effectively on video and document scripts as well, not requiring any preprocess like binarization, segmentation or hand-crafted features. The experimental comparisons on the datasets including SIW-13, CVSI-2015 and Multi-Script consistently demonstrate DisCNN a state-of-the-art approach for script identification.

Script Identification Research Articles

Related Topics

Articles published on Script Identification

Word-Level Multi-Script Indic Document Image Dataset and Baseline Results on Script Identification

Separating Indic Scripts with matra for Effective Handwritten Script Identification in Multi-Script Documents

PSI: Patch-based script identification using non-negative matrix factorization

Improving patch-based scene text script identification with ensembles of conjoined networks

PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification

Script Identification of Multi-Script Documents: a Survey

Writing type, script and language identification in heterogeneous documents

Writing type, script and language identification in heterogeneous documents

The Identification of Balinese Scripts’ Characters in Papyrus Based on Semantic Feature and K Nearest Neighbor

Line Parameter based Word-Level Indic Script Identification System

Identification of Fraktur and Latin Scripts in German Historical Documents Using Image Texture Analysis

A texture-based approach for word script and nature identification

Language Identification in Document Images

Script Identification Using Gabor Feature and SVM Classifier

Multilingual Artificial Text Extraction and Script Identification from Video Images

Language Identification in Document Images

A new dataset of word-level offline handwritten numeral images from four official Indic scripts and its benchmarking using image transform fusion

Handwritten Marathi Compound Character Segmentation Using Minutiae Detection Algorithm

Bangla and Oriya Script Lines Identification from Handwritten Document Images in Tri-script Scenario

Script identification in the wild via discriminative convolutional neural network

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Script Identification Research Articles

Related Topics

Articles published on Script Identification

Word-Level Multi-Script Indic Document Image Dataset and Baseline Results on Script Identification

Separating Indic Scripts with matra for Effective Handwritten Script Identification in Multi-Script Documents

PSI: Patch-based script identification using non-negative matrix factorization

Improving patch-based scene text script identification with ensembles of conjoined networks

PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification

Script Identification of Multi-Script Documents: a Survey

Writing type, script and language identification in heterogeneous documents

Writing type, script and language identification in heterogeneous documents

The Identification of Balinese Scripts’ Characters in Papyrus Based on Semantic Feature and K Nearest Neighbor

Line Parameter based Word-Level Indic Script Identification System

Identification of Fraktur and Latin Scripts in German Historical Documents Using Image Texture Analysis

A texture-based approach for word script and nature identification

Language Identification in Document Images

Script Identification Using Gabor Feature and SVM Classifier

Multilingual Artificial Text Extraction and Script Identification from Video Images

Language Identification in Document Images

A new dataset of word-level offline handwritten numeral images from four official Indic scripts and its benchmarking using image transform fusion

Handwritten Marathi Compound Character Segmentation Using Minutiae Detection Algorithm

Bangla and Oriya Script Lines Identification from Handwritten Document Images in Tri-script Scenario

Script identification in the wild via discriminative convolutional neural network