Abstract

An information-theoretic approach to numerically determine the Markov order of discrete stochastic processes defined over a finite state space is introduced. To measure statistical dependencies between different time points of symbolic time series, two information-theoretic measures are proposed. The first measure is time-lagged mutual information between the random variables Xn and Xn+k, representing the values of the process at time points n and n + k, respectively. The measure will be termed autoinformation, in analogy to the autocorrelation function for metric time series, but using Shannon entropy rather than linear correlation. This measure is complemented by the conditional mutual information between Xn and Xn+k, removing the influence of the intermediate values Xn+k−1, …, Xn+1. The second measure is termed partial autoinformation, in analogy to the partial autocorrelation function (PACF) in metric time series analysis. Mathematical relations with known quantities such as the entropy rate and active information storage are established. Both measures are applied to a number of examples, ranging from theoretical Markov and non-Markov processes with known stochastic properties, to models from statistical physics, and finally, to a discrete transform of an EEG data set. The combination of autoinformation and partial autoinformation yields important insights into the temporal structure of the data in all test cases. For first- and higher-order Markov processes, partial autoinformation correctly identifies the order parameter, but also suggests extended, non-Markovian effects in the examples that lack the Markov property. For three hidden Markov models (HMMs), the underlying Markov order is found. The combination of both quantities may be used as an early step in the analysis of experimental, non-metric time series and can be employed to discover higher-order Markov dependencies, non-Markovianity and periodicities in symbolic time series.

Highlights

  • AND BACKGROUNDInformation theory occupies a central role in time series analysis

  • Active information storage can be expressed as the difference of a joint entropy and the entropy rate: aX(n + k − 1, k) = I(Xn+k; X(nk+) k−1) = H(Xn+k) − H(Xn+k | X(nk+) k−1) = H(Xn+k) − hX(n + k − 1, k)

  • While autoinformation measures the statistical dependence between Xn and Xn+k directly, partial autoinformation removes the influence of the segment between both time points

Read more

Summary

Introduction

AND BACKGROUNDInformation theory occupies a central role in time series analysis. The concept of entropy provides numerous important connections to statistical physics and thermodynamics, often useful in the interpretation of the results (Kullback, 1959; Cover and Thomas, 2006). PAIF symbolic time series, collections of theory and methods are readily available (Daw et al, 2003; Mézard and Montanari, 2009). The result is a standardized procedure for analyzing continuous valued, discrete time stochastic processes (Box and Jenkins, 1976). The procedure addresses the impressive complexity of possible stochastic processes by combining semi-quantitative, visual analysis steps with a number of rigorous statistical test procedures. The first step in Box-Jenkins analysis is the visual and statistical assessment of the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of the data. The order of purely autoregressive processes can be directly deduced from the PACF coefficients. For a p-th order autoregressive process, it can be shown that PACF coefficients for time lags larger than p are equal to zero, within statistical limits

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call