Abstract

Some characteristics and week points of traditional density-based clustering algorithms are deeply analysed , then an improved way based on density distribution function is put forward. K Nearest Neighbor( KNN ) is used to measure the density of each point, then a local maximum density point is defined as the center point.. By means of local scale, classification is extended from the center point. For each point there is a procedure to find whether it is a core point by a radius scale factor. Then the classification is extended once again from the core point until the density descends to the given ratio of the density of the center point. The tests show that the improved algorithm greatly improves the sensitivity of density-based clustering algorithms to parameters and enhances the clustering effect of the high-dimensional data sets with uneven density distribution.

Highlights

  • Some characteristics and week points of traditional density-based clustering algorithms are deeply analysed, an improved way based on density distribution function is put forward

  • OPTICS(Ordering Points to Identify the Clustering Structure) doesn’t directly produce a data set cluster, and it calculates a clustering-ordering for auto and alternative clustering analysis, and the ordering represents the density-based clustering structure of data, which includes these information equal to density-based clustering required from a comprehensive parameter set scope

  • Aimed at the problems put forward above, in the improved clustering algorithm based on density distribution function, the idea of local scale is conducted (Ergun Bicici, Deniz Yuret, 2007; Zhou Shui-geng, Zhou Aoying, Jin Wen, Fan Ye, Qian Weining, 2000, pp735-744), namely, clustering is realized with the help of local statistics

Read more

Summary

OPTICS

OPTICS(Ordering Points to Identify the Clustering Structure) doesn’t directly produce a data set cluster, and it calculates a clustering-ordering for auto and alternative clustering analysis, and the ordering represents the density-based clustering structure of data, which includes these information equal to density-based clustering required from a comprehensive parameter set scope. To set up the sets or ordering of density-based clustering, a group of distance parameter values are processed at the same time by extending DBSCAN. For sake of building different clustering simultaneously, objects are processed by a specific order, by which density-reachable objects with minimum ε are selected in order that high density clustering can be firstly completed. According to this viewpoint, the two values of each object, core-distance and reachability-distance, are saved. OPTICS establishes an ordering of objects among databases, and saves the core-distance and suitable reachability-distance of each object. OPTICS abstracts clusters (Chen Yan, Geng Guohua, Zheng jianguo, 2005, pp; Hong Shaorong, Xiao Wenjun, 2008, pp; Ma Shuai, 2003, pp; Rong qiusheng, Yan Junbiao, Guo Guoqiang, 2004, pp12-16)

DENCLUE
Week Points of OPTICS and DENCLUE
Improved Clustering Algorithm Based on Density Distribution Function
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.