Abstract

A small amount of prior knowledge and randomly chosen initial cluster centers have a direct impact on the accuracy of the performance of iterative clustering algorithm. In this paper we propose a new algorithm to compute initial cluster centers for k-means clustering and the best number of the clusters with little prior knowledge and optimize clustering result. It constructs the Euclidean distance control factor based on aggregation density sparse degree to select the initial cluster center of nonuniform sparse data and obtains initial data clusters by multidimensional diffusion density distribution. Multiobjective clustering approach based on dynamic cumulative entropy is adopted to optimize the initial data clusters and the best number of the clusters. The experimental results show that the newly proposed algorithm has good performance to obtain the initial cluster centers for the k-means algorithm and it effectively improves the clustering accuracy of nonuniform sparse data by about 5%.

Highlights

  • Clustering is an important discovery technique of exploratory data mining and a common technique for statistical data analysis

  • In the medical field, cancer genomics [1], cell signaling [2], and viral genomes [3] use k-means as a data analysis tool; in the bioinformatics field, bioanalytical chemistry [4], the vibrational spectra of biomolecules [5], and the nervous system [6] use k-means to mine potential information; in the image analysis field, imaging techniques [7] use k-means to partition a given set of points into homogeneous groups; in the pattern recognition field, automatic system for imbalance diagnosis in wind turbines [8] uses k-means to suggest the optimum number of groups

  • All experiments were performed on Intel5 Core6 i5 with 3.30 GHz CPU and 4.00 GB of random access memory (RAM)

Read more

Summary

Introduction

Clustering is an important discovery technique of exploratory data mining and a common technique for statistical data analysis. We propose a new algorithm for nonuniform sparse data clustering based on cascade entropy increase and decrease It designs Euclidean distance sparse degree of aggregation density control factor, determines the initial cluster center of nonuniform sparse data, and groups initial data clusters by multidimensional diffusion data distribution density. We need to select optimal initial cluster centers by decision graph and execute k-means on complete data set based on multiobjective clustering approach. In this subsection we present execution steps of our proposed initial data clustering using multidimensional diffusion density distribution for k-means clustering. Compare J1, J2, and J3; the minimum is the best choice

Experimental Results and Analysis
Conclusion
Algorithm
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call