Traditional network models encapsulate travel behavior among all origin–destination pairs based on a simplified and generic travelers’ utility function. Typically, the utility function consists of travel time solely, and its coefficients are equated to estimates obtained from discrete choice models and stated preference data. While this modeling strategy is reasonable, the inherent sampling bias in individual-level experimental data may be further amplified over network flow aggregation, leading to inaccurate flow estimates. In addition, individual-level data must be collected from surveys or travel diaries, which may be labor-intensive, costly, and limited to a small time horizon. To address these limitations, this study extends classical bi-level formulations to estimate travelers’ utility functions with multiple attributes using system-level data. This data tends to be less subject to sampling bias than individual-level data, it is cheaper to collect and it has become increasingly diverse and available. To leverage system-level data, we formulate a methodology grounded on non-linear least squares to statistically infer travelers’ utility function in the network context using traffic counts, traffic speeds, the number of traffic incidents, and sociodemographic information obtained from the US Census, among other attributes. The analysis of the mathematical properties of the optimization problem and its pseudo-convexity motivates the use of normalized gradient descent, an algorithm developed in the machine learning community that is suitable for pseudo-convex programs. More importantly, we develop a hypothesis test framework to examine the statistical properties of coefficients attached to utility terms and to perform attribute selection. Experiments on synthetic data show that the travelers’ utility function coefficients can be consistently recovered and that hypothesis tests are reliable statistics to identify which attributes are determinants of travelers’ route choices. Besides, a series of Monte-Carlo experiments showed that statistical inference is robust to various levels of sensor coverage and to noises in the Origin-Destination matrix and the traffic count measurements. The methodology is also deployed at a large scale using real-world multi-source data in Fresno, CA, collected before and during the COVID-19 outbreak.
Read full abstract