K-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint

Andrzej Młodak

doi:10.1007/s00357-020-09370-5

Abstract

We analyze some possibilities of using contiguity (neighbourhood) matrix as a constraint in the clustering made by the k-means and Ward methods as well as by an approach based on distances and probabilistic assignments aimed at obtaining a solution of the multi-facility location problem (MFLP). That is, some special two-stage algorithms being the kinds of clustering with relational constraint are proposed. They optimize division of set of objects into clusters respecting the requirement that neighbours have to belong to the same cluster. In the case of the probabilistic d-clustering, relevant modification of its target function is suggested and studied. Versatile simulation study and empirical analysis verify the practical efficiency of these methods. The quality of clustering is assessed on the basis of indices of homogeneity, heterogeneity and correctness of clusters as well as the silhouette index. Using these tools and similarity indices (Rand, Peirce and Sokal and Sneath), it was shown that the probabilistic d-clustering can produce better results than Ward’s algorithm. In comparison with the k-means approach, the probabilistic d-clustering—although gives rather similar results—is more robust to creation of trivial (of which empty) clusters and produces less diversified (in replications, in terms of correctness) results than k-means approach, i.e. is more predictable from the point of view of the clustering quality.

Highlights

The cluster analysis, aimed at efficient classifying given multidimensional objects
We analyze some possibilities of using contiguity matrix as a constraint in the clustering made by the k-means and Ward methods as well as by an approach based on distances and probabilistic assignments aimed at obtaining a solution of the multi-facility location problem (MFLP)
Nowy Świat 4, 62–800 Kalisz, Poland Journal of Classification (2021) 38:313–352 such as labour market, environmental protection, economic situation) to internally homogeneous and mutually heterogeneous classes, offers many interesting algorithms. They can be of various types, such as translational, hierarchical and probabilistic

Summary

Introduction

The cluster analysis, aimed at efficient classifying given multidimensional objects In the conventional approach formulated by Ben-Israel and Iyigun (2008), the target function of probabilistic assignments and centres of clusters is optimized by a specific iteration algorithm based on Weiszfeld’s idea (cf Weiszfeld (1937)). This approach is a type of the multi-facility location problem (MFLP), which is aimed at simultaneous optimization of level of assignment and location of objects. Two appendices containing the proof of some mathematical results of the paper and results of additional computations are included

Basic Notions and Assumptions

Part I

Part II

Contiguity Constraint in the Ward Method

Probabilistic d-Clustering

The Neighbourhood Matrix in Clustering Using the Ben-Israel–Iyigun Method

Simulation Study

Method

Empirical Application

Findings

Concluding Remarks

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Classification	Publication Date: Aug 26, 2020
Citations: 7	License type: open-access

R Discovery Prime

R Discovery Prime

K-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Classification

Lead the way for us

Similar Papers

Using graph-based consensus clustering for combining K-means clustering of heterogeneous chemical structures
Faisal Saeed ... Naomie Salim
Journal of Cheminformatics | VOL. 5
Faisal Saeed, et. al.Faisal Saeed ... Naomie Salim
01 Mar 2013
Journal of Cheminformatics | VOL. 5

A program to perform Ward's clustering method on several regionalized variables
Carme Hervada-Sala ... Eusebi Jarauta-Bragulat
Computers & Geosciences | VOL. 30
Carme Hervada-Sala, et. al.Carme Hervada-Sala ... Eusebi Jarauta-Bragulat
28 Sep 2004
Computers & Geosciences | VOL. 30

Regional flood frequency analysis for the Gan-Ming River basin in China
Zhang Jingyi ... M.J Hall
Journal of Hydrology | VOL. 296
Zhang Jingyi, et. al.Zhang Jingyi ... M.J Hall
02 Jun 2004
Journal of Hydrology | VOL. 296

Cluster Analytic Detection of Biological Sex: A Second Look.
Fred H Borgen
Multivariate behavioral research | VOL. 18
Fred H BorgenFred H Borgen
01 Oct 1983
Multivariate behavioral research | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

K-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Classification