Abstract
Abstract—In OCR applications, the feature extraction methods used to recognize document images play an important role. The feature extraction methods may be statistical, structural or transforms and series expansion. The structural features are very difficult to extract particularly in handwritten applications. The structural behavior of the strokes existing in the handwritten expressions can be estimated through statistical methods too. In this paper, a feature extraction method is proposed that measures the distribution of black and white pixels representing various strokes in a character image by computing the weights on all the four corners on a pixel due to its neighboring black pixels. The feature is named as Neighborhood Pixels Weights (NPW). Its recognition performance is compared with some feature extraction methods, which have been generally used as secondary feature extraction methods for the recognition of many scripts in literature, on noisy and non-noisy handwritten character images. The experiments have been conducted using 17000 Devanagari handwritten character images. The experiments have been made using two classifiers i.e. Probabilistic Neural Network and k-Nearest Neighbor Classifier. NPW feature is better as compared to other features, studied here, in noisy and noise-less situation. considered. The structural features are based on the geometrical and topological properties of a character under consideration and these properties may be local or global (1). A character is composed of number of components in the form of strokes. These strokes may be lines, arcs, curves, etc and may or may not be connected to each other depending upon the structure of a character. These components are also called as stroke primitives and can be extracted from either skeleton or contour of a character image. In structural based recognition process, the various stroke primitives of a character are extracted and approximated. The relationships between various stroke components are established. It is somewhat difficult to extract and approximate the various stroke primitives existing in a character image as in some cases the strokes may not touch where touching is required and strokes may unnecessarily touch where touching is not required in the basic structure of a character while printing or writing. This approach also requires matching an approximated stroke primitive with stored prototypes which is not only complex to model but also requires multi-level heuristics. Moreover, these features are extracted from binary images only. The problems faced with structural features can be easily overcome with statistical features which are based on statistical distribution of black and white pixels in a character image. These features may be extracted from binary or gray scale images and are invariant to character distortions and writing styles to some extent. The features are easy to extract and can be computed with high speed as at a given pixel only some arithmetic or logic operations are required to perform which take less computational time and are not difficult to
Paper version not known (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have