Abstract

We investigate how efficiently a known underlying sparse causality structure of a simulated multivariate linear process can be retrieved from the analysis of time series of short lengths. Causality is quantified from conditional transfer entropy and the network is constructed by retaining only the statistically validated contributions. We compare results from three methodologies: two commonly used regularization methods, Glasso and ridge, and a newly introduced technique, LoGo, based on the combination of information filtering network and graphical modelling. For these three methodologies we explore the regions of time series lengths and model-parameters where a significant fraction of true causality links is retrieved. We conclude that when time series are short, with their lengths shorter than the number of variables, sparse models are better suited to uncover true causality links with LoGo retrieving the true causality network more accurately than Glasso and ridge.

Highlights

  • Establishing causal relations between variables from observation of their behaviour in time is central to scientific investigation and it is at the core of data-science where these causal relations are the basis for the construction of useful models and tools capable of prediction

  • Our results demonstrate that sparse models are superior in retrieving the true causality structure for short time series

  • Conditional transfer entropies were statistically validated with respect to null hypothesis at pV = 1% p value

Read more

Summary

Introduction

Establishing causal relations between variables from observation of their behaviour in time is central to scientific investigation and it is at the core of data-science where these causal relations are the basis for the construction of useful models and tools capable of prediction. Predictive models are methodologies, systems, or equations which identify and make use of such relations between sets of variables in a way that the knowledge about a set of variables provides information about the values of the other set of variables This problem is intrinsically high-dimensional with many input and output data. Any model that aims to explain the underlying system will involve a number of elements which must be of the order of magnitude of the number of relevant relations between the system’s variables In complex systems, such as financial markets or the brain, prediction is probabilistic in nature and modelling concerns inferring the probability of the values of a set of variables given the values of another set. This poses a great challenge for the model construction/selection and its parameter estimation because the number of relations between variables scales with, at least, the square of the number of variables but, for a given fix observation window, the amount of information gathered from such variables scales, at most, linearly with the number of variables [1, 2]

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call