Travel time reliability quantifies variability in travel times and has become a critical aspect for evaluating transportation network performance. The empirical travel time cumulative distribution function (CDF) has been used as a tool to preserve inherent information on the variability and distribution of travel times. With advances in data collection technology, probe vehicle data has been frequently used to measure highway system performance. One challenge with using CDFs when handling large amounts of probe vehicle data is deciding how many different CDFs are necessary to fully characterize experienced travel times. This paper explores statistical methods for clustering CDFs of travel times at segment level into an optimal number of homogeneous clusters that retain all relevant distributional information. Two clustering methods were tested, one based on classic hierarchical clustering and the other used model-based functional data clustering, to find out their performance on clustering distributions using travel time data from Interstate 64 in Virginia. Freeway segments and those within interchange areas were clustered separately. To find the proper data format as clustering input, both scaled and original travel times were considered. In addition, a non-data-driven method based on geometric features was included for comparison. The results showed that for freeway segments, clustering using travel times and the Anderson–Darling dissimilarity matrix and Ward’s linkage had the best performance. For interchange segments, model-based clustering provided the best clusters. By clustering segments into homogenous groups, the results of this study could improve the efficiency of further travel time reliability modeling.
Read full abstract