Cluster-weighted modeling with measurement error in covariates

Shaho Zarei

doi:10.1080/03610926.2024.2311795

Abstract

The cluster-weighted model (CWM) is a model-based clustering approach that utilizes a mixture of regression models to cluster data points based on both a response variable Y and covariates 𝑿, where the covariates are assumed to be random. The Gaussian CWM (GCWM) is the most commonly used member of the CWM family, where the Gaussian distribution is adopted for both the covariates and the response given the covariates. In mixture of regression, assignment of data points to the clusters is based on the conditional distribution of the response variable given covariates and is independent of the covariates’ distribution. In CWM, to increase clustering performance, the covariates’ distribution is also used to assign data points to the clusters. Existing researches on CWMs are limited to the directly observed covariates, which may not reflect real-world scenarios where measurement errors (MEs) occur. The measurement error can lead to inconsistent estimates, consequently, produce spurious or obscure clusters. In this article, we assume that random covariates 𝑿 are latent, observed with an independent ME that has the Gaussian distribution. A new generalized expectation maximization algorithm is defined for estimating model parameters. The performance of the proposal is illustrated and compared with the GCWM using both simulated and real data.

Full Text