Similarity-based constraint score for feature selection

Abderezak Salmi,Kamal Hammouche,Ludovic Macaire

doi:10.1016/j.knosys.2020.106429

Abstract

To avoid the curse of dimensionality resulting from a large number of features, the most relevant features should be selected. Several scores involving must-link and cannot-link constraints have been proposed to estimate the relevance of features. However, these constraint scores evaluate features one by one and ignore any correlation between them. In addition, they compute distance in the high-dimensional original feature space to evaluate similarity between samples. So, they would be corrupted by the curse of dimensionality. To deal with these drawbacks, we propose a new constraint score based on a similarity matrix that is computed in the selected feature subspace and that makes it possible to evaluate the relevance of a feature subset at once. Experiments on benchmark databases demonstrate the improvement brought by the proposed constraint score in the context of both supervised and semi-supervised learnings.

Full Text