Transformer has shown excellent performance in long-term time series forecasting because of its capability to capture long-term dependencies. However, existing Transformer-based approaches often overlook the unique characteristics inherent to time series, particularly multi-scale periodicity, which leads to a gap in inductive biases. To address this oversight, the temporal diffusion Transformer (TDT) was developed in this study to reveal the intrinsic evolution processes of time series. First, to uncover the connections among the periods of multi-periodic time series, the series are transformed into various types of patches using a multi-scale Patch method. Inspired by the principles of heat conduction, TDT conceptualizes the evolution of a time series as a diffusion process. TDT aims to achieve global consistency by minimizing energy constraints, which is accomplished through the iterative updating of patches. Finally, the results of these iterations across multiple periods are aggregated to form the TDT output. Compared to previous advanced models, TDT achieved state-of-the-art predictive performance in our experiments. In most datasets, TDT outperformed the baseline model by approximately 2% in terms of mean square error (MSE) and mean absolute error (MAE). Its effectiveness was further validated through ablation, efficiency, and hyperparameter analyses. TDT offers intuitive explanations by elucidating the diffusion process of time series patches throughout the iterative procedure.
Read full abstract