The importance of performance metrics within wrapper feature selection

Randall Wald,Taghi Khoshgoftaar,Amri Napolitano

doi:10.1109/iri.2013.6642460

Abstract

Many important datasets are affected by the problem of high dimensionality (having a large number of attributes or features), which can result in complex and time-consuming classification models. Feature selection techniques try to identify an optimal subset of features which may show improved classification performance as well as identify important features for the application at hand. Wrapper feature selection in particular uses a classifier to discover which feature subsets are most useful. However, feature selection can be affected by another dataset problem: imbalanced data. When one class outnumbers the other class(es), the chosen features may not reflect those most important to all classes - especially when wrapper feature selection uses a performance metric which does not consider class imbalance. No previous work has examined how the choice of performance metric within wrapper-based feature selection will affect classification performance. To study this effect, in this paper we consider two high-dimensional datasets drawn from the field of Twitter profile mining, both of which exhibit class imbalance. Using the Logistic Regression learner, we perform wrapper feature selection followed by classification, using five different performance metrics both (Area Under the Receiver Operating Characteristic Curve, Area Under the Precision Recall Curve, Best Arithmetic Mean of TPR and TNR, Best Geometric Mean of TPR and TNR, and Overall Accuracy) for the wrapper and for evaluating the classification model. We find that performance metrics which take class imbalance into account, especially the Area Under the Precision-Recall Curve, are far more effective than Overall Accuracy when used within the wrapper, producing much better performance as evaluated by the metrics which consider imbalance. In fact, even when Overall Accuracy is the classification metric, it is not the best metric to use within the wrapper. In addition, we find that there is no direct connection between the metric inside the wrapper and used for classification evaluation: the metrics show similar patterns across all four balance-aware metrics (e.g., all but Overall Accuracy).

Full Text