Abstract

K-means clustering algorithm is an important algorithm in unsupervised learning and plays an important role in big data processing, computer vision and other research fields. However, due to its sensitivity to initial partition, outliers, noise and other factors, the clustering results in data analysis, image segmentation and other fields are unstable and weak in robustness. Based on the fast global K-means clustering algorithm, this paper proposed an improved K-means clustering algorithm. Through the neighborhood filtering mechanism, the points in the neighborhood of the selected initial clustering center have not participated in the selection of the next initial clustering center, which can effectively reduce the randomness of initial partition and improve the efficiency of initial partition. Mahalanobis distance was used in the clustering process to better consider the global nature of data. Compared with the traditional clustering algorithm and other optimization algorithms, the results of real data set testing are significantly improved.

Highlights

  • With the development of artificial intelligence, researchers have explored more and more application scenarios for intelligent algorithms [1], and various machine learning algorithms have become research hotspots

  • Based on the fast global K-means clustering algorithm, this paper proposed an improved K-means clustering algorithm

  • Clustering experiments were carried out on traditional K-means algorithm, Fast Global K-means algorithm (FGK-means), fast global K-means algorithm based on neighborhood screening (RFGK-means), and fast global K-means algorithm based on neighborhood screening and Markov distance (RMFGK-means), respectively

Read more

Summary

Introduction

With the development of artificial intelligence, researchers have explored more and more application scenarios for intelligent algorithms [1], and various machine learning algorithms have become research hotspots. In the traditional K-means algorithm, the number of clustering centers is observed from the data according to experience, and the initial location of clustering centers is random. This results in the weak stability of the algorithm, which is affected by noise and outliers. Paper [5] used the method of residual analysis to automatically obtain the initial cluster center and number of class clusters from the decision graph, which solves the problem of manually specifying the number of class clusters This method is complex to implement and has poor effect on the sparsely distributed data set. Mahalanobis distance [13] is used in the process of clustering, which improves the global consideration of the clustering process and makes the algorithm more suitable for application in image processing

Traditional K-Means Clustering
Fast Global K-Means
Global K-Means Algorithm
Initial Value Filtering Optimizes Fast Global K-Means
Neighborhood Filter
Mahalanobis Distance
Average Error
Method:
Experiment and Results
Simulation Result
Experimental Analysis
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call