CREDIBILISTIC FUZZY CLUSTERING BASED ON ANALYSIS OF DATA DISTRIBUTION DENSITY AND THEIR PEAKS

Ye V Bodyanskiy,O V Kalynychenko,I P Pliss,A Yu Shafronenko

doi:10.15588/1607-3274-2022-3-6

Abstract

Context. The task of clustering – classification without a teacher of data arrays occupies a rather important place in Data Mining. To solve this problem, many approaches have been proposed at the moment, differing from each other in a priori assumptions in the studied and analyzed arrays, in the mathematical apparatus that is the basis of certain methods. The solution of clustering problems is complicated by the large dimension of the vectors of the analyzed observations, their distortion of various types. Objective. The purpose of the work is to introduce a fuzzy clustering procedure that combines the advantages of methods based on the analysis of data distribution densities and their peaks, which are characterized by high speed and can work effectively in conditions of classes that overlapping. Method. The method of fuzzy clustering of data arrays, based on the ideas of analyzing the distribution densities of these data, their peaks, and a confidence fuzzy approach has been introduced. The advantage of the proposed approach is to reduce the time for solving optimization problems related to finding attractors of density functions, since the number of calls to the optimization block is determined not by the volume of the analyzed array, but by the number of density peaks of the same array. Results. The method is quite simple in numerical implementation and is not critical to the choice of the optimization procedure. The experimental results confirm the effectiveness of the proposed approach in clustering problems under the condition of cluster intersection and allow us to recommend the proposed method for practical use in solving problems of automatic clustering of large data volumes. Conclusions. The method is quite simple in numerical implementation and is not critical to the choice of the optimization procedure. The advantage of the proposed approach is to reduce the time for solving optimization problems related to finding attractors of density functions, since the number of calls to the optimization block is determined not by the volume of the analyzed array, but by the number of density peaks of the same array. The method is quite simple in numerical implementation and is not critical to the choice of the optimization procedure. The experimental results confirm the effectiveness of the proposed approach in clustering problems under conditions of overlapping clusters.

Full Text