Abstract

As one of the most important techniques in data mining, clustering has always been highly concerned. Most clustering algorithms have encountered challenges, such as the difficulty of cluster centers selection, the artificial determination of the number of clusters ${K}$ , low accuracy of clustering, and uneven clustering efficiency of different data sets. Considering the difficulty of cluster centers chosen, a new algorithm of fast selecting the initial cluster centers is proposed in this paper. Generally, cluster centers are those data points with higher density, smaller radius threshold and far away from each other, this method uses ${MNN}$ ( ${M}$ nearest neighbors), density and distance to determine the initial cluster centers. First, the neighborhood radius ${r}$ of each point is measured by ${MNN}$ based on distance, and the average value of all ${r}$ is marked as $\bar {r}$ ; second, the densities $\rho $ of each point in the region within $\bar {r} $ are calculated; and then, factor ${f}$ is defined to describe the probability that points become cluster centers, based on which, the initial cluster centers are determined by the candidates with bigger ${f}$ . In the end, the method proposed in this paper is tested by using 12 groups of typical benchmark data sets and applied in the stellar spectral data of ${LAMOST}$ survey. The experiment results compared with the other six algorithms indicate that the initial cluster centers obtained by this method are of higher quality than that of the six algorithms. Meanwhile, the initial cluster centers of spectral data are of good agreement with the actual stellar classifications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call