Hierarchical variable clustering via copula-based divergence measures between random vectors

Steven De Keyser,Irène Gijbels

doi:10.1016/j.ijar.2023.109090

Abstract

This article considers rank-invariant clustering of continuous data via copula-based Φ-dependence measures. The general theoretical framework establishes dependence quantification between random vectors (groups of variables), which is used for measuring the similarity between variable clusters in an agglomerative hierarchical procedure afterwards. Special attention is devoted to meta-elliptical copulas, where we present an improved kernel estimator for the density generator and a corresponding bandwidth selector. This allows for non-Gaussian similarities also capturing e.g. tail dependence. Further, a fully non-parametric estimator is considered, enabling cluster detection in contexts where other measures fail. The theory is supported by simulations and a real data example, focusing on cluster analysis of continuous variables.

Full Text