GridFormer: Spatial-Temporal Transformer Network for Citywide Crowd Flow Prediction

Chaoqun Su,Defu Lian,Chenwang Wu

doi:10.3233/faia230519

Abstract

Crowd flow prediction plays a vital role in various fields such as traffic management, public safety, and urban planning. The main challenge in crowd flow prediction lies in effectively modeling the periodic temporal dependency and long-range spatial dependency. In the temporal domain, crowd flow shows a strong periodicity which is exploited by existing works to build multi-time-scale spatial-temporal features. However, these works hardly consider the disturbance of periods, that is, the crowd flow is not strictly periodic. In the spatial domain, existing works mainly utilize CNN to capture spatial dependency, but the small receptive field of the convolution operator limits the ability to capture the long-range dependency between crowd flows in different regions. In this paper, we propose GridFormer, a Transformer network, in which a periodically shifted sampling method and attention mechanism are employed to handle the temporal shifting in the daily and weekly periodicity, and a pyramid 3D Swin Transformers network is designed to capture long-range spatial dependency in a hierarchical manner. Meanwhile, the pyramid 3D Swin Transformers network jointly models spatial-temporal features to enable better interaction between the spatial and temporal domains. Experimental results on three crowd flow datasets demonstrate that our GridFormer outperforms the state-of-the-art crowd flow prediction methods.

Full Text