Abstract
Handwritten numeral recognition (HNR) is the most challenging task in the area of optical character recognition (OCR). OCR process involves both feature extraction and selection that is being generated by central or distributed sources. Presence of unimportant features in a feature set may lead to the “curse of dimensionality” and causes malfunctioning of the recognition system. A feature set is claimed as a good feature set if it contains only useful and discriminative features. In this research, the proposed model considers the projection distance of available black pixels from sharp boundary edges from all four directions (left, right, top and bottom) and directional longest run length of black pixels for rows, columns, principal diagonal and off-diagonal of any handwritten digit/character. This feature extraction algorithm is advantageous because it yields less number of features compared to zone and projection distance-based approach thus reduces computation cost without compromising the classification accuracy. Two levels of experiments have been performed to validate the authenticity of the proposed approach. In the first level, we consider the group of confusing classes, e.g. (0, 6, 8 and 9) and perform one-class classification for target-specific mining using support vector data description (SVDD); whereas, in the second level we consider all classes from 0 to 9 and perform one-class classification. Experiments are performed on own generated and MNIST data sets. For both data sets, the proposed model demonstrates better results as compared to zoning and directional-based approach of feature extraction. This paper considers classification accuracy, training time and feature set size as comparison parameters.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.