Abstract

Dimensionality reduction (feature selection) is an important step in pattern recognition systems. Although there are different conventional approaches for feature selection, such as Principal Component Analysis, Random Projection, and Linear Discriminant Analysis, selecting optimal, effective, and robust features is usually a difficult task. In this paper, a new two-stage approach for dimensionality reduction is proposed. This method is based on one-dimensional and two-dimensional spectrum diagrams of standard deviation and minimum to maximum distributions for initial feature vector elements. The proposed algorithm is validated in an OCR application, by using two big standard benchmark handwritten OCR datasets, MNIST and Hoda. In the beginning, a 133-element feature vector was selected from the most used features, proposed in the literature. Finally, the size of initial feature vector was reduced from 100% to 59.40% (79 elements) for the MNIST dataset, and to 43.61% (58 elements) for the Hoda dataset, in order. Meanwhile, the accuracies of OCR systems are enhanced 2.95% for the MNIST dataset, and 4.71% for the Hoda dataset. The achieved results show an improvement in the precision of the system in comparison to the rival approaches, Principal Component Analysis and Random Projection. The proposed technique can also be useful for generating decision rules in a pattern recognition system using rule-based classifiers.

Highlights

  • Pattern recognition (PR) is one of the most attractive branches in the artificial intelligence field

  • Researchers have produced some standard benchmark datasets in order to encourage other researchers to follow their investigation in the PR field and to compare the functionality of PR systems in the same conditions

  • By using the proposed 1D SD and 1D MM distribution diagrams methods, the initial feature vector was reduced to a smaller version based on the maximum allowable overlap between the spectrum lines using the threshold T1

Read more

Summary

Introduction

Pattern recognition (PR) is one of the most attractive branches in the artificial intelligence field. The necessity to find efficient techniques for reducing the volume of data in order to decrease the overall processing time and the memory requirements is considered more important than in the past. Among the various stages in PR systems, feature extraction plays a vital role in building system models, the recognition process, and system accuracy [17]. The features are the information that is fed to the recognizer to build a system model [18]. They should be insensitive to irrelevant variability in the input as much as possible, limited in number to permit for effective computation of discriminant functions and should not be similar, redundant, or repetitive. Features are categorized into global transformations [19], structural [20], statistical [21], and template-based matching [22]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call