The capability of predicting the future trends of crowds has rendered crowd flow prediction more critical in building intelligent transportation systems, and attracted substantial research efforts. The trend of crowd flows is closely related to time and the urban topography. Therefore, extracting and leveraging both spatial features and temporal features are key gradients for effectively predicting crowd flows. Many previous works extract spatial features from crowd-flow data in an iteration way. As a result, models suffer from a heavy computation cost while ignoring details of road topology and structure information. Meanwhile, temporal features, including short-term features and long-term features, are separately extracted. The fusion of all features at the last stage before accomplishing the prediction also neglects the underlying associativity between various features. To address the limitations, we leverage spatial features by extracting structural information of road structures, such as road connection, road density, road width, etc. Rather than extracting spatial features from crowd-flow data, we capture them from images of city maps by adopting convolutional neural networks. Moreover, we implement a new sequence feature fusion mechanism to merge both spatial features and temporal features from various time scales so as to predict crowd flows. We conduct extensive experiments to evaluate our model on three benchmark datasets. The experimental results demonstrate that the model outperforms 15 state-of-the-art methods. The source code is available at: https://github.com/CVisionProcessing/SPRNN.
Read full abstract