Abstract

High order discrete Markov chain is essential to analyze the dependency structure of data sets. To apply Markov chain correctly, even though the true order is an unknown parameter, statisticians have developed multiple order estimators. It is natural to identify the strongest order estimators under different parameter combinations. Aim for evaluating the performance of estimators, we study four of them in this paper: Akaike information criteria (AIC), Bayesian information criteria (BIC), Maximal fluctuation estimation method (PS), and approximate χ 2 − distribution method (Dk ). We simulated Cr × C transition matrices to generate word-count-based Markov sequences with the most straightforward initial distribution. We found PS and Dk give more accurate discrete Markov order estimation. Although AIC and BIC are commonly applied, their performances are not the most accurate. The accuracy declines approximately exponentially as the Markov model gets more complex, i.e. r ≥ 1 and C ≥ 3. AIC’s accuracy is higher when the Markov chain length is relatively small, but Dk yields a slightly higher accuracy under the same setting. PS give a more reasonable estimation when Markov order is the variable, i.e. 1 ≥ r ≥ 3. Dk gives more reasonable estimations when the length L and alphabet size C are variable, i.e. 150 ≥ L ≥ 800 and 3 ≥ C ≥ 5.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call