Abstract

Particle Swarm Optimization (PSO) based Fuzzy c-means (FCM) methods typically use random initialization, and could incur substantial computation costs in processing big data, although PSO facilitates the global optimization, based on our previous work [1]. This paper further developed and evaluated our data density-pattern based algorithm to guide initialization and to achieve better computational efficiency of PSO-based FCM. Data density patterns vary over the entire data space and the data points in high density areas are more likely around the cluster centroids. Based on this fact, our new algorithm attempts to improve the computational efficiency by auto-fusing data characteristics around the cluster centroids to initialize our algorithm. We evaluated our method using real and simulated imbalanced big data, and found this new method achieved comparable clustering performance as PSO-based FCM in terms of clustering cost, consistency and accuracy, but not consistently better than simple FCM. In terms of computational efficiency for imbalanced big data, our method seems to be comparable with PSO-based methods in terms of iterations and computational time, but both seem not comparable to simple FCM for imbalanced big data processing. Our simulation indicates that the classical PSO based FCM is slightly better than FCM on computational efficiency, although the clustering performance seems comparable. These findings seem to further support the robustness of FCM in big data processing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call