MDCUT2: a multi-density clustering algorithm with automatic detection of density variation in data with noise

Soumaya Louhichi,Hanêne Ben-Abdallah,Mariem Gzara

doi:10.1007/s10619-018-7253-1

Abstract

Despite their adoption in many applications, density-based clustering algorithms perform inefficiently when dealing with data with varied density, imbricated and/or adjacent clusters. Clusters of lower density may be classified as outliers, and adjacent and imbricated clusters with varied density may be aggregated. To handle this inefficiency, the MDCUT algorithm (Multiple Density ClUsTering) (Louhichi et al. in Pattern Recogn Lett 93:48–57, 2017) detects multiple local density parameters to handle density variation in the data. MDCUT extracts density local levels by analyzing mathematically the interpolated k-nearest neighbors function. A clustering Sub-routine is lunched for each density level to form the clusters of that level. Compared to well-known density based clustering algorithms, MDCUT recorded good results on artificial datasets. The main drawback of MDCUT is its sensitivity to the parameter p of the used interpolation technique and the parameter k for the number of nearest neighbors. In this paper, we propose a new extension of the MDCUT algorithm to detect automatically pairs of values (ki,ei) to characterize the density levels in the data, where ki and ei stand respectively for the number of neighbors and the radius threshold for the ith density level. We study the performance of the MDCUT2 algorithm on well-known data sets by comparison to reference density based clustering algorithms. This extension has improved the previous classification results.

Full Text