A Bayesian Approach for Model-Based Clustering of Several Binary Dissimilarity Matrices: The dmbc Package in R

Sergio Venturini,Raffaella Piccarreta

doi:10.18637/jss.v100.i16

Abstract

We introduce the new package dmbc that implements a Bayesian algorithm for clustering a set of binary dissimilarity matrices within a model-based framework. Specifically, we consider the case when S matrices are available, each describing the dissimilarities among the same n objects, possibly expressed by S subjects (judges), or measured under different experimental conditions, or with reference to different characteristics of the objects themselves. In particular, we focus on binary dissimilarities, taking values 0 or 1 depending on whether or not two objects are deemed as dissimilar. We are interested in analyzing such data using multidimensional scaling (MDS). Differently from standard MDS algorithms, our goal is to cluster the dissimilarity matrices and, simultaneously, to extract an MDS configuration specific for each cluster. To this end, we develop a fully Bayesian three-way MDS approach, where the elements of each dissimilarity matrix are modeled as a mixture of Bernoulli random vectors. The parameter estimates and the MDS configurations are derived using a hybrid Metropolis-Gibbs Markov Chain Monte Carlo algorithm. We also propose a BIC-like criterion for jointly selecting the optimal number of clusters and latent space dimensions. We illustrate our approach referring both to synthetic data and to a publicly available data set taken from the literature. For the sake of efficiency, the core computations in the package are implemented in C/C++. The package also allows the simulation of multiple chains through the support of the parallel package.

Highlights

Data consisting of proximity measures, that is measurements of the pairwise similarity or dissimilarity between n objects, abound in many fields, ranging from psychology, sociology dmbc: Model-Based Clustering of Binary Dissimilarity Matrices in R and market research, to ecology, demography, economics, as well as genomics and linguistics
The authors rely on Markov Chain Monte Carlo (MCMC) methods for estimation and introduce a criterion to select the number of dimensions
A distinctive feature of our approach compared to classic replicated MDS (RMDS) or weighted MDS (WMDS) is that in the latter a single multidimensional scaling (MDS) configuration is extracted, while we obtain a different configuration for each cluster of subjects

Summary

Introduction

Data consisting of proximity measures, that is measurements of the pairwise similarity or dissimilarity between n objects, abound in many fields, ranging from psychology, sociology dmbc: Model-Based Clustering of Binary Dissimilarity Matrices in R and market research, to ecology, demography, economics, as well as genomics and linguistics. While this can be reasonable when the analysis is confirmatory, an exploratory approach would instead surely benefit from a data-driven procedure aiming at identifying the possible clusters and their specific MDS configurations As already mentioned, this allows to ignore the possibly negligible differences across subjects, to detect groups of subjects sharing similar opinions, and to better shape and identify their most distinctive traits. A distinctive feature of our approach compared to classic RMDS or WMDS is that in the latter a single MDS configuration is extracted, while we obtain a different configuration for each cluster of subjects Under this perspective, our proposal can be regarded as an extension of RMDS, because we assume that there are groups of subjects – not known a priori – whose dissimilarities derive from the same cluster-specific MDS configuration.

Model specification and estimation

Model estimation

Initialization

Identifiability and post-processing

Model choice

The dmbc package

Object classes

Model fitting

Diagnostics

Exploration and visualization of results

Quantiles for each variable

Model selection

Empirical application

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Bayesian Approach for Model-Based Clustering of Several Binary Dissimilarity Matrices: The dmbc Package in R

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of statistical software

Lead the way for us

Journal: Journal of statistical software	Publication Date: Jan 1, 2021
License type: cc-by

Similar Papers

Inverse MDS: Inferring Dissimilarity Structure from Multiple Item Arrangements
Nikolaus Kriegeskorte ... Marieke Mur
Frontier in Psychology | VOL. 3
Nikolaus Kriegeskorte, et. al.Nikolaus Kriegeskorte ... Marieke Mur
01 Jan 2012
Frontier in Psychology | VOL. 3

Graphical exploration of network meta-analysis data: the use of multidimensional scaling
Hyoju Chung ... Thomas Lumley
Clinical Trials: Journal of the Society for Clinical Trials | VOL. 5
Hyoju Chung, et. al.Hyoju Chung ... Thomas Lumley
01 Aug 2008
Clinical Trials: Journal of the Society for Clinical Trials | VOL. 5

Nonlinear projection methods for visualizing Barcode data and application on two data sets
Madalina Olteanu ... Alain‐Didier Missoup
Molecular ecology resources | VOL. 13
Madalina Olteanu, et. al.Madalina Olteanu ... Alain‐Didier Missoup
03 Jan 2013
Molecular ecology resources | VOL. 13

Cosmological Parameter Inference with Bayesian Statistics
Luis E Padilla ... Luis A Escamilla
Universe | VOL. 7
Luis E Padilla, et. al.Luis E Padilla ... Luis A Escamilla
28 Jun 2021
Universe | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Bayesian Approach for Model-Based Clustering of Several Binary Dissimilarity Matrices: The dmbc Package in R

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of statistical software