Abstract

As location information of numerous Internet of Thing (IoT) devices can be recognized through IoT sensor technology, the need for technology to efficiently analyze spatial data is increasing. One of the famous algorithms for classifying dense data into one cluster is Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Existing DBSCAN research focuses on efficiently finding clusters in numeric data or categorical data. In this paper, we propose the novel problem of discovering a set of adjacent clusters among the cluster results derived for each keyword in the keyword-based DBSCAN algorithm. The existing DBSCAN algorithm has a problem in that it is necessary to calculate the number of all cases in order to find adjacent clusters among clusters derived as a result of the algorithm. To solve this problem, we developed the Genetic algorithm-based Keyword Matching DBSCAN (GKM-DBSCAN) algorithm to which the genetic algorithm was applied to discover the set of adjacent clusters among the cluster results derived for each keyword. In order to improve the performance of GKM-DBSCAN, we improved the general genetic algorithm by performing a genetic operation in groups. We conducted extensive experiments on both real and synthetic datasets to show the effectiveness of GKM-DBSCAN than the brute-force method. The experimental results show that GKM-DBSCAN outperforms the brute-force method by up to 21 times. GKM-DBSCAN with the index number binarization (INB) is 1.8 times faster than GKM-DBSCAN with the cluster number binarization (CNB).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call