Abstract

Finite state space hidden Markov models are flexible tools to model phenomena with complex time dependencies: any process distribution can be approximated by a hidden Markov model with enough hidden states. We consider the problem of estimating an unknown process distribution using nonparametric hidden Markov models in the misspecified setting, that is when the data-generating process may not be a hidden Markov model. We show that when the true distribution is exponentially mixing and satisfies a forgetting assumption, the maximum likelihood estimator recovers the best approximation of the true distribution. We prove a finite sample bound on the resulting error and show that it is optimal in the minimax sense--up to logarithmic factors--when the model is well specified.

Highlights

  • Let (Y1, . . . , Yn) be a sample following some unknown distribution P∗

  • The true distribution may not belong to the model at hand: this is the so-called misspecified setting

  • The goal of this paper is to establish a finite sample bound on the error of the maximum likelihood estimator for a large class of true distributions and a large class of nonparametric hidden Markov models

Read more

Summary

Introduction

Let (Y1, . . . , Yn) be a sample following some unknown distribution P∗. The maximum likelihood estimator can be formalized as follows: let {Pθ}θ∈Θ, the model, be a family of possible distributions; pick a distribution Pθof the model which maximizes the likelihood of the observed sample. The goal of this paper is to establish a finite sample bound on the error of the maximum likelihood estimator for a large class of true distributions and a large class of nonparametric hidden Markov models. The main result of this paper is an oracle inequality that holds as soon as the models have controlled tails This bound is optimal when the true distribution is a HMM taking values in R. A simplified version of our main result (Theorem 6) is the following oracle inequality: there exist constants A and n0 such that if the penalty is large enough, the penalized maximum likelihood estimator θn satisfies for all t 1, η ∈ (0, 1) and n n0, with probability larger than 1 − e−t − n−2: K(θn). Appendix A contains the proof of the minimax adaptivity result and Appendix B contains the proof of the main technical lemma of Section 5

Notations and assumptions
Hidden Markov models
The model selection estimator
Assumptions on the true distribution
Model assumptions
Oracle inequality for the prediction error
Minimax adaptive estimation using location-scale mixtures
Perspectives
Overview of the proof
Proofs
Proof of Lemma 12
Proofs for the mixture framework
Concentration inequality
Reduction of the set
Decomposition into simple sets
Findings
Choice of parameters
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call