Popular Scripts Research Articles

Recognition of numeric postal codes in a multi-script environment is a classical problem in any postal automation system. In such postal documents, determination of the script of the handwritten postal codes is crucial for subsequent invocation of the digit recognizers for respective scripts. The current framework attempts to infer about the script of the numeric postal code without having any bias from the script of the textual address part of the rest of the address block, as they might differ in a potential multi-script environment. Scope of the current work is to recognize the postal codes written in any of the four popular scripts, viz., Latin, Devanagari, Bangla and Urdu. For this purpose, we first implement a Hough transformation based technique to localize the postal-code blocks from structured postal documents with defined address block region. Isolated handwritten digit patterns are then extracted from the localized postal-code region. In the next stage of the developed framework, similar shaped digit patterns of the said four scripts are grouped in 25 clusters. A script independent unified pattern classifier is then designed to classify the numeric postal codes into one of these 25 clusters. Based on these classification decisions a rule-based script inference engine is designed to infer about the script of the numeric postal code. One of the four script specific classifiers is subsequently invoked to recognize the digit patterns of the corresponding script. A novel quad-tree based image partitioning technique is also developed in this work for effective feature extraction from the numeric digit patterns. The average recognition accuracy over ten-fold cross validation of results for the support vector machine (SVM) based 25-class unified pattern classifier is obtained as 92.03%. With randomly selected six-digit numeric strings of four different scripts; an average of 96.72% script inference accuracy is achieved. The average of tenfold cross-validation recognition accuracies of the individual SVM classifiers for the Latin, Devanagari, Bangla and Urdu numerals are observed as 95.55%, 95.63%, 97.15% and 96.20%, respectively.

Read full abstract

An Optical Character Recognition (OCR) approach for printed Arabic script is presented in this paper, Which is one of the most popular scripts in the world. Development of an OCR system. For Arabic script it is difficult because Arabic characters are distinct and many structurally similar characters exist in the character set. In the proposed approach, the technique can be divided into three major steps. The first step is digitization then do some pre-processing like segmentation to detect the slant of character and correct it .Second, feature extraction , using gray-level matrices. Finally, the K-Nearest-Neighbors is used for classification. This method was tested using 45 patterns for each Arabic character with different fonts (simplified Arabic, tahoma, traditional Arabic), The sample images were divided into 20 training and 25 test images. Images in the test set did not appear in the training sets. This method performs extremely well with recognition rates .90.3%. This is a very good performance. All of this demonstrates that the new method is able to handle printed Arabic character task efficiently. It is a promising technique for recognition printed Arabic character. 1. Introduction Optical character recognition (OCR), deals with the recognition of optically processed character rather magnetically processed ones. In a typical OCR system, input characters are read and digitized by an optical scanner. Each character is then located, segmented and the resulting matrix is fed into a preprocessor. Off-line recognition can de considered the most general case: no special device is required for writing and signal interpretation is independent of signal generation, as in human recognition [6]. The recognition of Arabic character has been an area of great interest for many years, and a number of research papers and reports have already been published in this area. There are several major problems with Arabic character recognition: Arabic characters are distinct and ideographic, many structurally similar character exist in the character set Table (1). Thus, classification criteria are difficult to generate [1 J[3j[6]. The Arabic language has a rich vocabulary. More than 200 million people speak this language as their native speaking, and over 1 billion people use its character set, such as Persian and Urdu, Due to the cursive nature of the script, there are several characteristic that make recognition of Arabic distinct from the recognition of Latin script or Chinese The study of Arabic character recognition has been regarded since 1980s. However, in comparison with the other languages, such as Latin, Chinese and Japanese, there is a little work has been conducted on the automatic recognition of Arabic character [4][5].

Read full abstract

Popular Scripts Research Articles

Related Topics

Articles published on Popular Scripts

Word-Level Script Identification Using Texture Based Features

Recognition of archaic Lanna handwritten manuscripts using a hybrid bio-inspired algorithm

A survey on optical character recognition for Bangla and Devanagari scripts

A statistical–topological feature combination for recognition of handwritten numerals

Handwritten Script Recognition using DCT, Gabor Filter and Wavelet Features at Line Level

Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization

A novel framework for automatic sorting of postal documents with multi-script address blocks

Bridging Cultures in the Schools

Recognition of Printed Arabic Character Using Gray- Scale Matrices

»Thinking Ahead«. Fiction as Prediction in Popular Scripts on Political Scenarios

Segmentation of touching characters in printed devnagari and bangla scripts using fuzzy multifactorial analysis

A complete printed Bangla OCR system

Computer recognition of printed Bangla script

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Popular Scripts Research Articles

Related Topics

Articles published on Popular Scripts

Word-Level Script Identification Using Texture Based Features

Recognition of archaic Lanna handwritten manuscripts using a hybrid bio-inspired algorithm

A survey on optical character recognition for Bangla and Devanagari scripts

A statistical–topological feature combination for recognition of handwritten numerals

Handwritten Script Recognition using DCT, Gabor Filter and Wavelet Features at Line Level

Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization

A novel framework for automatic sorting of postal documents with multi-script address blocks

Bridging Cultures in the Schools

Recognition of Printed Arabic Character Using Gray- Scale Matrices

»Thinking Ahead«. Fiction as Prediction in Popular Scripts on Political Scenarios

Segmentation of touching characters in printed devnagari and bangla scripts using fuzzy multifactorial analysis

A complete printed Bangla OCR system

Computer recognition of printed Bangla script