Abstract

The Gaussian hidden Markov model (HMM) is widely considered for the analysis of heterogenous continuous multivariate longitudinal data. To robustify this approach with respect to possible elliptical heavy-tailed departures from normality, due to the presence of outliers, spurious points, or noise (collectively referred to as bad points herein), the contaminated Gaussian HMM is here introduced. The contaminated Gaussian distribution represents an elliptical generalization of the Gaussian distribution and allows for automatic detection of bad points in the same natural way as observations are typically assigned to the latent states in the HMM context. Once the model is fitted, each observation has a posterior probability of belonging to a particular state and, inside each state, of being a bad point or not. In addition to the parameters of the classical Gaussian HMM, for each state we have two more parameters, both with a specific and useful interpretation: one controls the proportion of bad points and one specifies their degree of atypicality. A sufficient condition for the identifiability of the model is given, an expectation-conditional maximization algorithm is outlined for parameter estimation and various operational issues are discussed. Using a large-scale simulation study, but also an illustrative artificial dataset, we demonstrate the effectiveness of the proposed model in comparison with HMMs of different elliptical distributions, and we also evaluate the performance of some well-known information criteria in selecting the true number of latent states. The model is finally used to fit data on criminal activities in Italian provinces. Supplementary materials for this article are available online

Highlights

  • Hidden Markov models (HMMs) are the state of the art in the analysis of time-dependent data

  • Farcomeni and Greco (2015) introduce a robust S-estimator and Bernardi et al (2014) propose the use of the multivariate t distribution for multivariate financial data in a HMM framework. We extend this branch of literature by introducing a joint approach to time-varying robust clustering and bad points detection under a longitudinal setting, extending the standard HMM framework

  • For the purpose of evaluation of this aspect, we report the true positive rate (TPR), measuring the proportion of bad points that are correctly identified as bad points, and the false positive rate (FPR), corresponding to the proportion of good points incorrectly classified as bad points

Read more

Summary

Introduction

Hidden Markov models (HMMs) are the state of the art in the analysis of time-dependent data. The attempt of robustly estimating mixture models parameters has led to a heterogeneous literature that includes: noise approaches (Banfield and Raftery, 1993; Fraley and Raftery, 2002), i.e. methods aiming at identifying a noise component (modelled assuming a uniform component-specific distribution), while simultaneously clustering non-noise observations; distance approaches (Rousseeuw and Leroy, 2005; Cerioli, 2010; GarciaEscudero et al, 2015); distribution-based robust approaches (Peel and McLachlan, 2000; Andrews and Mcnicholas, 2012) While all these methods offer important contributions to the topic, the last two methods do not allow for the direct detection of bad points. Its investigation and use in a clustering framework is still in infancy, some results have been recently obtained by Punzo and McNicholas (2014a, 2015, 2014b) in a cross-sectional setting This change makes the model much more robust and allows for automatic detection of bad points.

The model
Identifiability
Maximum likelihood estimation
Note on robustness
Detection of bad points and further constraints
Comparison between HMMs of elliptical distributions
Selecting the number of hidden states
Artificial longitudinal blue crabs data
Criminal activities in Italian provinces
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call