Abstract

Incident duration models are often developed to assist incident management and traveler information dissemination. With recent advances in data collection and management, enormous achieved incident data are now available for incident model development. However, a large volume of data may present challenges to practitioners, such as data processing and computation. Besides, data that span multiple years may have inconsistency issues because of the data collection environments and procedures. A practical question may arise in the incident modeling community—Is that much data really necessary (“all-in”) to build models? If not, then how many data are necessary? To answer these questions, this study aims to investigate the relationship between the data sample sizes and the reliability of incident duration analysis models. This study proposed and demonstrated a sample size determination framework through a case study using data of over 47,000 incidents. This study estimated handfuls of hazard-based duration models with varying sample sizes. The relationships between sample size and model performance, along with estimate outcomes (i.e., coefficients and significance levels), were examined and visualized. The results showed that the variation of estimated coefficients decreases as the sample size increases, and becomes stabilized when the sample size reaches a critical threshold value. This critical threshold value may be the recommended sample size. The case study suggested a sample size of 6,500 to be enough for a reliable incident duration model. The critical value may vary significantly with different data and model specifications. More implications are discussed in the paper.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call