Abstract

Clustering large datasets with high dimensionality effectively and efficiently is a challenging task. The authors made an attempt to face the challenge by developing a Scalable and Robust Clustering(SRC) algorithm by hybridizing the grid and density based clustering algorithms. In contrast to the data points belonging to various clusters based on their similarity, the outliers are the data points with dissimilar / abnormal behavior from the rest of the points. Hence this paper investigates the applicability of the SRC algorithm developed by the authors for Outlier Detection (SRC-OD) from large datasets with high dimensionality. A framework is developed to analyze the performance of proposed SRC algorithm for outlier detection compared with that of the existing outlier detection (OD) algorithms using Jaccard Similarity metric and execution time. The results produced by SRC-OD algorithm are comparable with Isolation forest Outlier detection algorithm (ISO) which is the best known OD algorithm, based on Jaccard similarity, while the proposed SRC_OD algorithm is more scalable than all other Outlier detection algorithms except ISO as its execution time grows slower. Moreover, SRC-OD framework detects the outliers as a by product while clustering large datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call