Abstract

This chapter discusses techniques for the selection of a subset of features from a larger pool of available features. The techniques include: outlier removal, data normalization, hypothesis testing, the receiver operating characteristic curve, fisher's discriminant ratio and so on. The goal is to select those that are rich in discriminatory information with respect to the classification problem at hand. This is a crucial step in the design of any classification system, as a poor choice of features drives the classifier to perform badly. Selecting highly informative features is an attempt: to place classes in the feature space far apart from each other (large between-class distance); and to position the data points within each class close to each other (small within-class variance). Another major issue in feature selection is choosing the number of features l to be used out of an original m > l. Reducing this number is in line with our goal of avoiding overfitting to the specific training data set and of designing classifiers that result in good generalization performance—that is, classifiers that perform well when faced with data outside the training set. The choice of l depends heavily on the number of available training patterns, N. Before feature selection techniques can be used, a preprocessing stage is necessary for “housekeeping” purposes, such as removal of outlier points and data normalization.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.