Abstract

Clustering algorithms have a very wide range of applications on data analysis, such as machine learning, data mining. However, data sets often have problems with unbalanced and non-spherical distribution. Clustering by fast search and find of density peaks (DPC) is a density-based clustering algorithm which could identify clusters with non-spherical data. In real applications, this algorithm and its variants are not very effective for the division of unevenly distributed clusters, because they only use one indicator (the distance of neighbor points) to handle inner points and boundary points at the same time. To this end, we introduce a new indicator named asymmetry measure which enhances the ability of finding boundary points. Then we propose a boundary detection-based density peaks clustering (BDDPC) algorithm that combines the above two indicators, so that different clusters are separated from each other accurately and the purpose of improving the clustering effect is achieved. The BDDPC algorithm can not only cluster uniformly distributed data, but also cluster unevenly distributed data. In real life, the distribution of high-dimensional data sets are always unbalanced, so this algorithm has very important applications. Experimental results with synthetic and real-world data sets illustrate the effectiveness of our algorithm.

Highlights

  • Clustering has been a hot topic, which is well known as unsupervised learning algorithm in the area of machine learning due to its importance in many modern science research, such as communication science, computer science, biology science, etc

  • In order to address the issue above, we introduce a new indicator named asymmetry measure which is very effective for finding boundary points

  • EXPERIMENTS ON SYNTHETIC DATA SETS In this subsection, we show the performance of proposed based density peaks clustering (BDDPC) algorithm on synthetic data sets

Read more

Summary

INTRODUCTION

Clustering has been a hot topic, which is well known as unsupervised learning algorithm in the area of machine learning due to its importance in many modern science research, such as communication science, computer science, biology science, etc. The DPC-KNN-PCA [17] was designed based on k nearest neighbor and principal component analysis, which provides an alternative for local density calculation The advantage of these algorithms is that they can better handle the category of boundary points. In order to address the issue above, we introduce a new indicator named asymmetry measure which is very effective for finding boundary points Combining these two indicators, a boundary detection-based density peaks clustering (BDDPC) algorithm is proposed. An effective local density calculation method is designed based on both the distance of neighbor points and the asymmetry measure of data points This makes it easier for the boundary points to be divided into the correct clusters when the categories are assigned in the step, especially for the clusters with unbalanced distribution in data set.

RELATED WORKS
BDDPC ALGORITHM
EXPERIMENTS
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.