Deep learning has achieved incredible success over the past years, especially in various challenging predictive spatiotemporal analytics (PSTA) tasks, such as disease prediction, climate forecast, and traffic prediction, where intrinsic dependence relationships among data exist and generally manifest at multiple spatiotemporal scales. However, given a specific PSTA task and the corresponding data set, how to appropriately determine the desired configuration of a deep learning model, theoretically analyze the model's learning behavior, and quantitatively characterize the model's learning capacity remains a mystery. In order to demystify the power of deep learning for PSTA in a theoretically sound and explainable way, in this article, we provide a comprehensive framework for deep learning model design and information-theoretic analysis. First, we develop and demonstrate a novel interactively and integratively connected deep recurrent neural network (I2DRNN) model. I2DRNN consists of three modules: an input module that integrates data from heterogeneous sources; a hidden module that captures the information at different scales while allowing the information to flow interactively between layers; and an output module that models the integrative effects of information from various hidden layers to generate the output predictions. Second, to theoretically prove that our designed model can learn multiscale spatiotemporal dependence in PSTA tasks, we provide an information-theoretic analysis to examine the information-based learning capacity (i-CAP) of the proposed model. In so doing, we can tackle an important open question in deep learning, that is, how to determine the necessary and sufficient configurations of a designed deep learning model with respect to the given learning data sets. Third, to validate the I2DRNN model and confirm its i-CAP, we systematically conduct a series of experiments involving both synthetic data sets and real-world PSTA tasks. The experimental results show that the I2DRNN model outperforms both classical and state-of-the-art models on all data sets and PSTA tasks. More importantly, as readily validated, the proposed model captures the multiscale spatiotemporal dependence, which is meaningful in the real-world context. Furthermore, the model configuration that corresponds to the best performance on a given data set always falls into the range between the necessary and sufficient configurations, as derived from the information-theoretic analysis.
Read full abstract