Predicting metro incident duration is crucial for passengers and transit operators to choose appropriate response strategies. Most existing research focuses on structured data, the rich information embedded within unstructured incident logs is often neglected. This paper incorporates a probabilistic topic model tailored for short texts, the biterm topic model, into the generic incident duration prediction models. By capturing text co-occurrence patterns through Bayesian inference, the biterm topic model extracts hidden topics from incident narratives, and each topic serves as a condensed summary of detailed incident causes and countermeasures. These extracted topics are then combined with structured information to serve as predictors. We validated our model using five years of incident data from the Hong Kong Mass Transit Railway across two scenarios: with all incident information available and information revealed over time . The results demonstrate that our method significantly improves the prediction accuracy, particularly for incidents lasting longer than 30-minute.
Read full abstract