Abstract

Sequence analysis is being more and more widely used for the analysis of social sequences and other multivariate categorical time series data. However, it is often complex to describe, visualize, and compare large sequence data, especially when there are multiple parallel sequences per subject. Hidden (latent) Markov models (HMMs) are able to detect underlying latent structures and they can be used in various longitudinal settings: to account for measurement error, to detect unobservable states, or to compress information across several types of observations. Extending to mixture hidden Markov models (MHMMs) allows clustering data into homogeneous subsets, with or without external covariates. The seqHMM package in R is designed for the efficient modeling of sequences and other categorical time series data containing one or multiple subjects with one or multiple interdependent sequences using HMMs and MHMMs. Also other restricted variants of the MHMM can be fitted, e.g., latent class models, Markov models, mixture Markov models, or even ordinary multinomial regression models with suitable parameterization of the HMM. Good graphical presentations of data and models are useful during the whole analysis process from the first glimpse at the data to model fitting and presentation of results. The package provides easy options for plotting parallel sequence data, and proposes visualizing HMMs as directed graphs.

Highlights

  • Social sequence analysis is being more and more widely used for the analysis of longitudinal data consisting of multiple independent subjects with one or multiple interdependent seqHMM: Mixture Hidden Markov Models for Sequence Data in R sequences

  • A simple option is to group sequences beforehand; afterwards one Hidden (latent) Markov models (HMMs) is fitted for each cluster

  • Hidden Markov models are useful in various longitudinal settings with categorical observations

Read more

Summary

Introduction

Social sequence analysis is being more and more widely used for the analysis of longitudinal data consisting of multiple independent subjects with one or multiple interdependent seqHMM: Mixture Hidden Markov Models for Sequence Data in R sequences (channels). Hidden (latent) Markov models (HMMs) can be used to compress and visualize information in such data. For modeling continuous-time processes as hidden Markov models, the msm package (Jackson 2011) is available Both hmm.discnp and msm support only single-channel observations. The LMest package (Bartolucci, Pandolfi, and Pennoni 2017) is aimed at panel data with a large number of subjects and a small number of time points It can be used for hidden Markov modeling of multivariate and multi-channel categorical data, using covariates in emission and transition processes. LMest supports mixed latent Markov models, where the latent process is allowed to vary in different latent subpopulations This differs from mixture hidden Markov models used in seqHMM, where the emission probabilities vary between groups.

Sequences and sequence analysis
Hidden Markov models
Clustering by mixture hidden Markov models
Important special cases
Package features
Building and fitting models
Visualizing sequence data
Visualizing hidden Markov models
Examples with life course data
Sequence data
Clustering and mixture hidden Markov models
Visualizing mixture hidden Markov models
Conclusion
Notations
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call