Abstract

The feature selection method based on supervised learning has been widely studied and applied to the field of machine learning and data mining. But unsupervised feature selection is still a tricky area of research because the unavailability of the label information, especially for clustering tasks. Irrelevant features and redundant features in the original data seriously block the discovery of clustering structure and weaken the performance of the subsequent classification. In order to address this problem, the unsupervised feature selection and clustering algorithm based on the evolutionary computing framework is proposed in this paper. First, the binary differential evolution algorithm is constructed for unsupervised feature selection. Specifically, the individuals of the population are used to characterize the feature subspaces and the improved Laplacian model is designed to measure the local manifold structure of each individual. Subsequently, the approximate optimal manifold structure and the corresponding feature subset are obtained. Then, the continuous differential evolutionary algorithm is executed on the optimized feature subset, in which the individual representation strategy and the integrated individual measure function are designed for clustering. Moreover, the predicted pseudo-labels are utilized to classify and further verify the validity of clustering. The experimental results demonstrate that the proposed framework outperforms the most state-of-the-art methods.

Highlights

  • Nowadays, the phenomenon of high dimensionality has become increasingly prominent in the real world applications

  • The work of the paper focuses on two parts: unsupervised feature selection based on discrete difference evolution (UFDDE) and clustering algorithm based on continuous differential evolution (CCDE)

  • PROPOSED EVOLUTIONARY CLUSTERING ALGORITHM In order to verify the ability of the feature subspace selected by the UFDDE to characterize the original data structure, an adaptive clustering algorithm based on continuous differential evolution (CCDE) is proposed in the paper

Read more

Summary

INTRODUCTION

The phenomenon of high dimensionality has become increasingly prominent in the real world applications. Dong: Unsupervised Feature Selection and Clustering Optimization Based on Improved Differential Evolution is widely used in feature selection because of its good global search ability [6]. In the unsupervised feature selection, most of the current algorithms adopt the transformation method to map the original high-dimensional space to the new low-dimensional space to achieve the purpose of dimensionality reduction, which makes the obtained feature subset lose the original physical meaning of the original data set and reduces the interpretability of the learning model. Some unsupervised feature selection algorithms analyze the clustering performance according to the size of feature subset, and the learning model is less adaptive. In response to the above issues, a framework for unsupervised feature selection and clustering based on improved differential evolution (UFSCDE) is proposed. The basic steps of the K -means algorithm are presented as follow: 1 Select K samples randomly from the original data set as the initial cluster center; 2 Calculate the distance between the remaining samples and the K cluster centers, and divide the samples into the nearest cluster center; 3 recalculate the centers of the K clusters; 4 Repeat 2 and 3 until the center of the cluster is unchanged or reaches the certain number of iterations and the fault tolerance

PROPOSED UNSUPERVISED FEATURE SELECTION ALGORITHM
CROSSOVER OPERATOR BASED ON FITNESS VALUE
INDIVIDUAL SELECTION OPERATION
8: Compute similarity matrix by Z
PROPOSED EVOLUTIONARY CLUSTERING ALGORITHM
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call