Consistent Estimation of Partition Markov Models

Jesús García,Verónica González-López

doi:10.3390/e19040160

Abstract

The Partition Markov Model characterizes the process by a partition L of the state space, where the elements in each part of L share the same transition probability to an arbitrary element in the alphabet. This model aims to answer the following questions: what is the minimal number of parameters needed to specify a Markov chain and how to estimate these parameters. In order to answer these questions, we build a consistent strategy for model selection which consist of: giving a size n realization of the process, finding a model within the Partition Markov class, with a minimal number of parts to represent the process law. From the strategy, we derive a measure that establishes a metric in the state space. In addition, we show that if the law of the process is Markovian, then, eventually, when n goes to infinity, L will be retrieved. We show an application to model internet navigation patterns.

Highlights

The Markov models have received enormous visibility for being powerful tools [1,2,3]
Under the assumption of this family, we address the problem of model selection, showing that the model can be selected consistently using the Bayesian Information Criterion (BIC)
The development of the partition concept in Markov processes allows for proving that, for a stationary, finite memory process and a sample large enough, it is theoretically possible to consistently find a minimal partition to represent the process and this can be accomplished in practice

Summary

Introduction

The Markov models have received enormous visibility for being powerful tools [1,2,3]. [4] shows that the Bayesian Information Criterion (BIC)—[5]—can be used to consistently choose a Variable Length Markov Chain model in an efficient way using the Context. The Partition Markov Models are being used and explored intensively: for instance, [12] combines two statistical concepts—Copulas and Partition Markov Models—with the purpose of defining a natural correction for the estimator of the transition probabilities of a multivariate Markov process. We introduce a distance between the parts of a partition, and this concept defines a metric on the state space and allows it to build efficient algorithms for estimating the optimal partition (see [10]). The proof of the results introduced in this paper are included in Appendixes A and B

Preliminaries

Consistent Estimation through the Bayesian Information Criterion

A Metric on the State Space

Consistent Estimation of the Process’s Partition

Method

Conclusions