Abstract

Feature selection is an effective pre-processing step to remove possible redundant and irrelevant features for various machine learning paradigms, which can help build a more understandable machine learning based component in hybrid expert systems. Single-label quadratic programming feature selection (QPFS) model is formulated as a QP problem with a unit simplex constraint to both minimize feature-feature redundancy and maximize feature-label relevance simultaneously. Since its redundancy matrix is desired not to be positive semi-definite and then estimated by Nystrom low-rank approximation method, its performance depends greatly on under-sampling rate and random permutation. In this paper, without any approximation, we extend this model to construct a regularized version (rQPFS), resulting in a strictly convex QP problem, to achieve a globally optimal subset of features. To reduce the increment of computational time for entire redundancy matrix, Frank–Wolfe method with a sub-linear convergence rate and its special cases for strictly convex function and unit simplex constraint are applied to solve our rQPFS efficiently. Furthermore, to tackle multi-label FS paradigm, pruned problem transformation trick is used to evaluate feature-label relevance to describe label correlations sufficiently. The detailed experimental study on eight data sets shows that our proposed method performs the best, compared with six state-of-the-art multi-label FS methods, according to six instance-based performance evaluation measures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call