Abstract

The synergy between data security and high intensive computing has envisioned the way to robust anomaly detection schemes which in turn necessitates the need for efficient data analysis. Data clustering is one of the most important components of data analytics, and plays an important role in various Internet of Things (IoT)-enabled applications such as-Industrial IoT, Smart Grids, Connected Vehicles, etc. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is one such clustering technique which is widely used to detect anomalies in large-scale data. However, the traditional DBSCAN algorithm suffers from the nearest neighbor search and parameter selection problems, which may cause the performance of any implemented solution in this environment to deteriorate. To remove these gaps, in this paper, a multi-stage model for anomaly detection has been proposed by rectifying the problems incurred in traditional DBSCAN. In the first stage of the proposed solution, Boruta algorithm is used to capture the relevant set of features from the dataset. In the second stage, firefly algorithm, with a Davies–Bouldin Index based K-medoid approach, is used to perform the partitioning. In the third stage, a kernel-based locality sensitive hashing is used along with the traditional DBSCAN to solve the problem of the nearest neighbor search. Finally, the resulting set of the nearest neighbors are used in k-distance graph to determine the desired set of parameters, i.e., Eps (maximum radius of the neighborhood) and MinPts (minimum number of points in Eps neighborhood) for DBSCAN. Several sets of experiments have been performed on different datasets to demonstrate the effectiveness of the proposed scheme.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.