Abstract

Short-term metro passenger flow prediction is vital for the operation and management of metro systems. Most studies focus on the higher prediction accuracy with statistical and machine learning methods, but little attention has been paid to the prioritization and selection of feature variables, especially for different metro station types. This study aims to analyze the effect of feature variables on the prediction results, and then select appropriate predictor variables accordingly. A novel three-stage framework is proposed to prioritize feature variables for short-term metro passenger flow prediction, including station clustering, feature extraction, and variable prioritization. A hierarchical clustering algorithm (AHC) is developed for station clustering, the results of which are verified by the K-means and Davies-Bouldin (DB) statistical index. We then extract the temporal, spatial, and external features. Finally, the association between the variables and the prediction results is explored using tree-based models. The proposed framework is demonstrated and validated with data collected from Shanghai Metro Automatic Fare Collection (AFC) system. The results highlight that the importance of feature variables for developing models varies between stations, whereas only a few variables are found to explain most of the variation in the testing dataset; different feature variables lead to distinct differences in prediction accuracy, and simply adding more predictor variables does not necessarily lead to higher prediction accuracy. In addition, the station type and prediction type (i.e., tap-in and tap-out) have little influence on the selection of feature variables.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call