While massive multiple-input multiple-output (MIMO) has achieved tremendous success in both theory and practice, it faces a crisis of sharp performance degradation in moderate or high-mobility scenarios (e.g., 30 km/h), due to the breach of uplink-downlink channel duality. Such a “curse of mobility” has spurred the research on channel prediction in high-mobility scenarios. Instead of predicting channel response matrix in the space-frequency domain, we investigate it in the angle-delay domain by utilizing the high angle-delay resolution of wideband massive MIMO systems. Specifically, we study the general angle-delay domain channel characterization and obtain that: 1) the correlations between the angle-delay domain channel response matrix (ADCRM) elements are decoupled significantly; 2) when the number of antennas and bandwidth are limited, the decoupling is insufficient and residual correlations between the neighboring ADCRM elements exist. Then focusing on the ADCRM, we propose two channel prediction methods: a spatio-temporal autoregressive (ST-AR) model-driven unsupervised-learning method and a deep learning (DL) based data-driven supervised-learning method. While the model-driven method provides a principled way for channel prediction, the data-driven method is generalizable to various channel scenarios. In particular, ST-AR exploits the residual spatio-temporal correlations of the channel element with its most neighboring elements, and DL realizes element-wise angle-delay domain channel prediction utilizing a complex-valued neural network (CVNN). Simulation results under the 3GPP non-line-of-sight (NLOS) scenarios indicate that, compared to the state-of-the-art Prony-based angular-delay domain (PAD) prediction method, both the proposed ST-AR and the CVNN-based channel prediction methods can enhance the channel prediction accuracy.