Clustering compositional data using Dirichlet mixture model.

Samyajoy Pal,Christian Heumann

doi:10.1371/journal.pone.0268438

Abstract

A model-based clustering method for compositional data is explored in this article. Most methods for compositional data analysis require some kind of transformation. The proposed method builds a mixture model using Dirichlet distribution which works with the unit sum constraint. The mixture model uses a hard EM algorithm with some modification to overcome the problem of fast convergence with empty clusters. This work includes a rigorous simulation study to evaluate the performance of the proposed method over varied dimensions, number of clusters, and overlap. The performance of the model is also compared with other popular clustering algorithms often used for compositional data analysis (e.g. KMeans, Gaussian mixture model (GMM) Gaussian Mixture Model with Hard EM (Hard GMM), partition around medoids (PAM), Clustering Large Applications based on Randomized Search (CLARANS), Density-Based Spatial Clustering of Applications with Noise (DBSCAN) etc.) for simulated data as well as two real data problems coming from the business and marketing domain and physical science domain, respectively. The study has shown promising results exploiting different distributional patterns of compositional data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: May 18, 2022
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Clustering compositional data using Dirichlet mixture model.

Abstract

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

A comparative evaluation of clustering methods and data sampling techniques in the prediction of reservoir landslide deformation state
Qi Ge ... Hong Yue Sun
Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards | VOL. ahead-of-print
Qi Ge, et. al.Qi Ge ... Hong Yue Sun
13 Apr 2024
Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards | VOL. ahead-of-print

Efficient Approaches for Density-Based Spatial Clustering of Applications with Noise
Pretom Kumar Saha ... Doina Logofatu
-
Pretom Kumar Saha, et. al.Pretom Kumar Saha ... Doina Logofatu
01 Jan 2020
01 Jan 2020

Estimating hotspots using a Gaussian mixture model from large-scale taxi GPS trace data
Jin-Jun Tang ... He-Lai Huang
Transportation Safety and Environment | VOL. 1
Jin-Jun Tang, et. al.Jin-Jun Tang ... He-Lai Huang
01 Nov 2019
Transportation Safety and Environment | VOL. 1

Detection of open cluster members inside and beyond tidal radius by machine learning methods based on Gaia DR3
M Noormohammadi ... M Khakian Ghomi
Monthly Notices of the Royal Astronomical Society | VOL. 532
M Noormohammadi, et. al.M Noormohammadi ... M Khakian Ghomi
13 Jun 2024
Monthly Notices of the Royal Astronomical Society | VOL. 532

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clustering compositional data using Dirichlet mixture model.

Abstract

Talk to us

Similar Papers

More From: PLOS ONE