The ability of spatial-temporal traffic demand prediction is crucial for urban computing, traffic management and future autonomous driving. In this paper, a novel Spatial-Temporal Guided Multi-graph Sandwich-Transformer (STGMT) is suggested to address the ubiquitous spatial-temporal heterogeneity in traffic demand forecasting. Compared to the original Transformer, we employ Time to Vector (Time2Vec) and Node to Vector (Node2Vec) in the embedding layer to obtain universal representations for temporal nodes and spatial nodes, respectively, which are then combined to form Spatial-Temporal Embedding (STE) blocks. The STE guides the attention mechanism, maintaining a unique parameter space for spatial-temporal nodes and enabling the learning of node-specific patterns. In STGMT, we develop Multi-head Temporal Attention (MTA) and Multi-head Temporal Interactive Attention (MTIA) for extracting temporal features, while Multi-head Spatial Attention (MSA) is employed for extracting spatial features. Furthermore, MSA incorporates both the accessibility graph determined by road topology and the similarity graph determined by specific traffic events to characterize the pairwise relationships among spatial nodes. Various attentions and feed-forward layers are rearranged and combined to form the Sandwich-Transformer. Extensive experiments are conducted on public datasets of node-level tasks of two different types (highway and urban) and indicate that the STGMT outperforms state-of-the-art models. The proposed STGMT effectively addresses the ubiquitous spatial-temporal heterogeneity challenge in traffic demand forecasting, thereby enhancing the accuracy of traffic demand prediction and offering valuable guidance for traffic planning and operations. Our code and data are open source at https://github.com/YanJieWen/STGMT-Tensorflow-implementation.
Read full abstract