Abstract

Multiscale brings great benefits for people to observe objects or problems from different perspectives. It has practical significance for clustering on multiscale data. At present, there is a lack of research on the clustering of large‐scale data under the premise that clustering results of small‐scale datasets have been obtained. If one does cluster on large‐scale datasets by using traditional methods, two disadvantages are as follows: (1) Clustering results of small‐scale datasets are not utilized. (2) Traditional method will cause more running overhead. Aims at these shortcomings, this paper proposes a multiscale clustering framework based on DBSCAN. This framework uses DBSCAN for clustering small‐scale datasets, then introduces algorithm Scaling‐Up Cluster Centers (SUCC) generating cluster centers of large‐scale datasets by merging clustering results of small‐scale datasets, not mining raw large‐scale datasets. We show experimentally that, compared to traditional algorithm DBACAN and leading algorithms DBSCAN++ and HDBSCAN, SUCC can provide not only competitive performance but reduce computational cost. In addition, under the guidance of experts, the performance of SUCC is more competitive in accuracy.

Highlights

  • Clustering is one of the vital data mining and machine learning techniques and that aims to group similar objects into the same cluster and separate dissimilar objects into different clusters [1]

  • We provide a mathematical model, design a novel algorithm named Scaling-Up Cluster Centers (SUCC) from small scale to large scale, which avoids repetitive clustering on raw datasets

  • Experimental results show that the SUCC is efficient and reduces runtime consumption with competitive accuracy compared to traditional methods and the leading algorithms, which need to deal with raw data that is much more than cluster centers belonged small-scale data in most instances

Read more

Summary

Introduction

Clustering is one of the vital data mining and machine learning techniques and that aims to group similar objects into the same cluster and separate dissimilar objects into different clusters [1]. We concentrate on Scaling-Up Cluster Centers (SUCC) from small scale to large scale and avoid repetitive clustering on raw datasets, with competitive efficiency. It is inefficient that clustering at each scale data by using the traditional method, i.e., reclustering, while SUCC can improve efficiency by computing cluster centers belonged small-scale data and obtaining large scale’s clusters. We provide a mathematical model (multiscale clustering framework), design a novel algorithm named Scaling-Up Cluster Centers (SUCC) from small scale to large scale, which avoids repetitive clustering on raw datasets. Experimental results show that the SUCC is efficient and reduces runtime consumption with competitive accuracy compared to traditional methods and the leading algorithms, which need to deal with raw data that is much more than cluster centers belonged small-scale data in most instances.

Related Work
Problem Description
Proposed Framework
Performance Evaluations
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.