Abstract
A new density-based clustering algorithm, RNN-DBSCAN , is presented which uses reverse nearest neighbor counts as an estimate of observation density. Clustering is performed using a DBSCAN -like approach based on $k$ nearest neighbor graph traversals through dense observations. RNN-DBSCAN is preferable to the popular density-based clustering algorithm DBSCAN in two aspects. First, problem complexity is reduced to the use of a single parameter (choice of $k$ nearest neighbors), and second, an improved ability for handling large variations in cluster density (heterogeneous density). The superiority of RNN-DBSCAN is demonstrated on several artificial and real-world datasets with respect to prior work on reverse nearest neighbor based clustering approaches ( RECORD , IS-DBSCAN , and ISB-DBSCAN ) along with DBSCAN and OPTICS . Each of these clustering approaches is described by a common graph-based interpretation wherein clusters of dense observations are defined as connected components, along with a discussion on their computational complexity. Heuristics for RNN-DBSCAN parameter selection are presented, and the effects of $k$ on RNN-DBSCAN clusterings discussed. Additionally, with respect to scalability, an approximate version of RNN-DBSCAN is presented leveraging an existing approximate $k$ nearest neighbor technique.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Knowledge and Data Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.