Abstract

With the rapid increase of high-dimensional data mixed with labelled and unlabelled samples, the semi-supervised feature selection technique has received much attention in recent years. However, most existing approaches ignore the fuzziness of the data. Moreover, many feature selection methods need to measure the relationships among all samples, which is inefficient and difficult to be applied to large-scale data. To address the problems mentioned above, we propose an effective semi-supervised feature selection with the soft label learning (SFS-SLL) method in this paper. Specifically, we first learn initial soft labels based on the local distance between samples and clustering centers using an efficient fuzzy C-means clustering. We propose a supervised semantic constraint to exploit labelled and unlabelled data using manual labels as the soft label learning guidance. Then, we propose a simple yet effective sparse regression model which integrates soft label learning and feature selection into a unified framework. Finally, we derive an effective optimization strategy based on the alternating direction method of multipliers (ADMM) to iteratively solve the formulated problem. Experiment results on several benchmark datasets show a performance improvement on feature selection accuracy and efficiency over compared methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call