Abstract

Selection of features as a pre-processing stage is an essential issue in many machine learning tasks (such as classification) to reduce data dimensionality as there are many irrelevant and redundant features that can mislead the learning process. Graph-based sparse feature selection is developed to overcome this issue. In this paper, a novel graph-based sparse feature selection method is proposed that take into account both issues: relevancy and redundancy analysis. An empirical loss function joining with ℓ1-norm regularization term is proposed to overcome the relevancy issue and the redundancy issue is overcome by introducing a regularization term that prefers uncorrelated features. Furthermore, the proposed learning procedure is guided by two different sets of supervision information as pairs of must-linked (positive) and cannot-linked (negative) constraint sets to select a discriminative feature subset. These guiding information besides the whole data points are encoded in the graph Laplacian matrix that preserves the locality structure of the original data. The graph Laplacian matrix is constructed by two different approaches. Our first approach tries to preserve the structure of the original data guided just by the positive data points (unique samples in the must-linked constraints), and our second approach applies a normalized adapted affinity matrix to embed the pairwise must-linked and cannot-linked constraints as well as the neighborhood relationships information, all together. The experimental results on a number of several datasets from the University of California-Irvine machine learning repository, in addition to several high dimensional gene expression datasets show the efficacy of the proposed methods in the classification tasks compared to several powerful feature selection methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call