Abstract

In the absence of mechanistic or phenomenological models of real-world systems, data-driven models become necessary. The discovery of various embedding theorems in the 1980s and 1990s motivated a powerful set of tools for analyzing deterministic dynamical systems via delay-coordinate embeddings of observations of their component states. However, in many branches of science, the condition of operational determinism is not satisfied, and stochastic models must be brought to bear. For such stochastic models, the tool set developed for delay-coordinate embedding is no longer appropriate, and a new toolkit must be developed. We present an information-theoretic criterion, the negative log-predictive likelihood, for selecting the embedding dimension for a predictively optimal data-driven model of a stochastic dynamical system. We develop a nonparametric estimator for the negative log-predictive likelihood and compare its performance to a recently proposed criterion based on active information storage. Finally, we show how the output of the model selection procedure can be used to compare candidate predictors for a stochastic system to an information-theoretic lower bound.

Highlights

  • When studying the dynamics of a newly encountered complex system, the best first model for the system often comes directly from measurements of the system itself

  • We have developed an information-theoretic model selection procedure for determining the optimal model order for prediction of a subcomponent of a stochastic dynamical system

  • We demonstrated that minimizing the entropy rate is, conceptually, exactly the complement of maximizing the active information storage

Read more

Summary

INTRODUCTION

When studying the dynamics of a newly encountered complex system, the best first model for the system often comes directly from measurements of the system itself. [20] for selecting the embedding dimension for a delay-coordinate embedding-based predictor of a deterministic dynamical system They propose maximizing the estimated mutual information between the embedding vector used for prediction and the future at some time horizon, a quantity called the active information storage [28], to determine the appropriate embedding dimension. Tion storage has a single tuning parameter that can be fixed following an asymptotic argument, which is beneficial both in terms of computational speed and ease of use They show that in practice their method chooses the embedding parameters that give optimal prediction for several synthetic and real-world systems when prediction is restricted to a nearest-neighbor regression using the delay-coordinate embedding.

Stochastic dynamical systems
Active information storage of stochastic dynamical systems
The negative log-predictive likelihood
DEMONSTRATION OF MODEL SELECTION WITH STOCHASTIC MAPS
Stochastic logistic map
Self-exciting threshold autoregressive model
Autoregressive conditionally heteroskedastic model
THE UTILITY OF ENTROPY RATE IN EVALUATING CANDIDATE PREDICTORS
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call