Analytical review of clustering techniques and proximity measures

Vivek Mehta,Seema Bawa,Jasmeet Singh

doi:10.1007/s10462-020-09840-7

Abstract

One of the most fundamental approaches to learn and understand from any type of data is by organizing it into meaningful groups (or clusters) and then analyzing them, which is a process known as cluster analysis. During this process of grouping, proximity measures play a significant role in deciding the similarity level of two objects. Moreover, before applying any learning algorithm on a dataset, different aspects related to preprocessing such as dealing with the sparsity of data, leveraging the correlation among features and normalizing the scales of different features are required to be considered. In this study, various proximity measures have been discussed and analyzed from the aforementioned aspects. In addition, a theoretical procedure for selecting a proximity measure for clustering purpose is proposed. This procedure can also be used in the process of designing a new proximity measure. Second, clustering algorithms of different categories have been overviewed and experimentally compared for various datasets of different domains. The datasets have been chosen in such a way that they range from a very low number of dimensions to a very high number of dimensions. Finally, the effect of using different proximity measures is analyzed in partitional and hierarchical clustering techniques based on experiments.

Full Text