Abstract
Finite state space hidden Markov models are flexible tools to model phenomena with complex time dependencies: any process distribution can be approximated by a hidden Markov model with enough hidden states. We consider the problem of estimating an unknown process distribution using nonparametric hidden Markov models in the misspecified setting, that is when the data-generating process may not be a hidden Markov model. We show that when the true distribution is exponentially mixing and satisfies a forgetting assumption, the maximum likelihood estimator recovers the best approximation of the true distribution. We prove a finite sample bound on the resulting error and show that it is optimal in the minimax sense--up to logarithmic factors--when the model is well specified.
Highlights
Let (Y1, . . . , Yn) be a sample following some unknown distribution P∗
The true distribution may not belong to the model at hand: this is the so-called misspecified setting
The goal of this paper is to establish a finite sample bound on the error of the maximum likelihood estimator for a large class of true distributions and a large class of nonparametric hidden Markov models
Summary
Let (Y1, . . . , Yn) be a sample following some unknown distribution P∗. The maximum likelihood estimator can be formalized as follows: let {Pθ}θ∈Θ, the model, be a family of possible distributions; pick a distribution Pθof the model which maximizes the likelihood of the observed sample. The goal of this paper is to establish a finite sample bound on the error of the maximum likelihood estimator for a large class of true distributions and a large class of nonparametric hidden Markov models. The main result of this paper is an oracle inequality that holds as soon as the models have controlled tails This bound is optimal when the true distribution is a HMM taking values in R. A simplified version of our main result (Theorem 6) is the following oracle inequality: there exist constants A and n0 such that if the penalty is large enough, the penalized maximum likelihood estimator θn satisfies for all t 1, η ∈ (0, 1) and n n0, with probability larger than 1 − e−t − n−2: K(θn). Appendix A contains the proof of the minimax adaptivity result and Appendix B contains the proof of the main technical lemma of Section 5
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have