Abstract

In this paper, we report on an empirical study of several high dimensional classification problems and show that much of the discriminant information may lie in low dimensional subspaces. Feature subset selection is achieved either by forward selection or backward elimination from the full feature space with support vector machines (SVMs) as base classifiers. These wrapper methods are compared with a filter method of feature selection using information gain as discriminant criterion. Publicly available data sets in areas of text categorization, chemoinformatics, and gene expression analysis are used to illustrate the idea. We found that forward selection systematically outperforms backward elimination at low dimensions when applied to these problems. These observations are known anecdotally in the machine learning community, but here we provide empirical support on a wide range of problems in different domains.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call