Attribute reduction algorithm based on combined distance in clustering

Baohua Liang,Zhengyu Lu

doi:10.3233/jifs-222666

Abstract

Attribute reduction is a widely used technique in data preprocessing, aiming to remove redundant and irrelevant attributes. However, most attribute reduction models only consider the importance of attributes as an important basis for reduction, without considering the relationship between attributes and the impact on classification results. In order to overcome this shortcoming, this article firstly defines the distance between samples based on the number of combinations formed by comparing the samples in the same sub-division. Secondly, from the point of view of clustering, according to the principle that the distance between each point in the cluster should be as small as possible, and the sample distance between different clusters should be as large as possible, the combined distance is used to define the importance of attributes. Finally, according to the importance of attributes, a new attribute reduction mechanism is proposed. Furthermore, plenty of experiments are done to verify the performance of the proposed reduction algorithm. The results show that the data sets reduced by our algorithm has a prominent advantage in classification accuracy, which can effectively reduce the dimensionality of high-dimensional data, and at the same time provide new methods for the study of attribute reduction models.

Full Text