Abstract

Multivariate time-dependent data, where multiple features are observed over time for a set of individuals, are increasingly widespread in many application domains. To model these data, we need to account for relations among both time instants and variables and, at the same time, for subject heterogeneity. We propose a new co-clustering methodology for grouping individuals and variables simultaneously, designed to handle both functional and longitudinal data. Our approach borrows some concepts from the curve registration framework by embedding the shape invariant model in the latent block model, estimated via a suitable modification of the SEM-Gibbs algorithm. The resulting procedure allows for several user-defined specifications of the notion of cluster that can be chosen on substantive grounds and provides parsimonious summaries of complex time-dependent data by partitioning data matrices into homogeneous blocks. Along with the explicit modelling of time evolution, these aspects allow for an easy interpretation of the clusters, from which also low-dimensional settings may benefit.

Highlights

  • Time-dependent data, arising when measurements are taken on a set of units at different time occasions, are pervasive in a plethora of different fields

  • We propose a modification of the SEM-Gibbs algorithm, called marginalized SEM-Gibbs (M-SEM), where an additional marginalization step is introduced to account for the random effects

  • A first visual inspection of the time evolutions reveals that the procedure is able to discriminate the pollens according to their seasonality

Read more

Summary

Introduction

Time-dependent data, arising when measurements are taken on a set of units at different time occasions, are pervasive in a plethora of different fields. Readers may refer to Rice (2004) for a thorough comparison and discussion about differences and similarities between functional and longitudinal data analysis Developments in these areas mainly aim to describe individual-specific curves by properly accounting for the correlation between measurements for each subject Even in the co-clustering context, the usual dualism between distance-based and density-based strategies can be found

Modelling Time-Dependent Data
Latent Block Model
Model Specification
Model Estimation
Model Selection
Remarks
Synthetic Data
COVID-19 Evolution Across Countries
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call