Abstract This study presents a hybrid data-mining framework based on feature selection algorithms and clustering methods to perform the pattern discovery of high-speed railway train rescheduling strategies (RSs). The proposed model is composed of two states. In the first state, decision tree, random forest, gradient boosting decision tree (GBDT) and extreme gradient boosting (XGBoost) models are used to investigate the importance of features. The features that have a high influence on RSs are first selected. In the second state, a K-means clustering method is used to uncover the interdependences between RSs and the influencing features, based on the results in the first state. The proposed method can determine the quantitative relationships between RSs and influencing factors. The results clearly show the influences of the factors on RSs, the possibilities of different train operation RSs under different situations, as well as some key time periods and key trains that the controllers should pay more attention to. The research in this paper can help train traffic controllers better understand the train operation patterns and provides direction for optimizing rail traffic RSs.
Read full abstract