Abstract

We introduce the new package dmbc that implements a Bayesian algorithm for clustering a set of binary dissimilarity matrices within a model-based framework. Specifically, we consider the case when S matrices are available, each describing the dissimilarities among the same n objects, possibly expressed by S subjects (judges), or measured under different experimental conditions, or with reference to different characteristics of the objects themselves. In particular, we focus on binary dissimilarities, taking values 0 or 1 depending on whether or not two objects are deemed as dissimilar. We are interested in analyzing such data using multidimensional scaling (MDS). Differently from standard MDS algorithms, our goal is to cluster the dissimilarity matrices and, simultaneously, to extract an MDS configuration specific for each cluster. To this end, we develop a fully Bayesian three-way MDS approach, where the elements of each dissimilarity matrix are modeled as a mixture of Bernoulli random vectors. The parameter estimates and the MDS configurations are derived using a hybrid Metropolis-Gibbs Markov Chain Monte Carlo algorithm. We also propose a BIC-like criterion for jointly selecting the optimal number of clusters and latent space dimensions. We illustrate our approach referring both to synthetic data and to a publicly available data set taken from the literature. For the sake of efficiency, the core computations in the package are implemented in C/C++. The package also allows the simulation of multiple chains through the support of the parallel package.

Highlights

  • Data consisting of proximity measures, that is measurements of the pairwise similarity or dissimilarity between n objects, abound in many fields, ranging from psychology, sociology dmbc: Model-Based Clustering of Binary Dissimilarity Matrices in R and market research, to ecology, demography, economics, as well as genomics and linguistics

  • The authors rely on Markov Chain Monte Carlo (MCMC) methods for estimation and introduce a criterion to select the number of dimensions

  • A distinctive feature of our approach compared to classic replicated MDS (RMDS) or weighted MDS (WMDS) is that in the latter a single multidimensional scaling (MDS) configuration is extracted, while we obtain a different configuration for each cluster of subjects

Read more

Summary

Introduction

Data consisting of proximity measures, that is measurements of the pairwise similarity or dissimilarity between n objects, abound in many fields, ranging from psychology, sociology dmbc: Model-Based Clustering of Binary Dissimilarity Matrices in R and market research, to ecology, demography, economics, as well as genomics and linguistics. While this can be reasonable when the analysis is confirmatory, an exploratory approach would instead surely benefit from a data-driven procedure aiming at identifying the possible clusters and their specific MDS configurations As already mentioned, this allows to ignore the possibly negligible differences across subjects, to detect groups of subjects sharing similar opinions, and to better shape and identify their most distinctive traits. A distinctive feature of our approach compared to classic RMDS or WMDS is that in the latter a single MDS configuration is extracted, while we obtain a different configuration for each cluster of subjects Under this perspective, our proposal can be regarded as an extension of RMDS, because we assume that there are groups of subjects – not known a priori – whose dissimilarities derive from the same cluster-specific MDS configuration.

Model specification and estimation
Model estimation
Initialization
Identifiability and post-processing
Model choice
The dmbc package
Object classes
Model fitting
Diagnostics
Exploration and visualization of results
Quantiles for each variable
Model selection
Empirical application
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call