Use of the multinomial logit (MNL) functional form is widespread in transportation and land use demand modeling. As the state-of-the-art evolves away from aggregate forecasting and toward activity-based models and microsimulation, MNL models are being asked to accommodate increasingly large choice sets. Different strategies exist to address the challenge of estimating models with very large choice sets, but perhaps none is as commonly employed as sampling of alternatives. The effect of sampling of alternatives on parameter estimation has received considerable attention in the scientific literature. Yet comparatively little quantitative research exists that examines the issues that arise when the same sampling strategies are applied in the context of forecasting. In this study we conducted an analysis of the effect of sampling of alternatives in discrete choice models for disaggregate location choice forecasting. First, a novel measure of forecast error was defined and then used to quantify the extent of the problem as a function of sample rate, model error, and size of the universe of alternatives. Finally, we explored the potential for strategic sampling techniques to reduce the error observed under simple random sampling. In general, we found that the proportion of aggregate demand that was misallocated owing to sampling of alternatives was actually reduced as the size of the universe of alternatives increased. Additionally, simple random sampling was shown to outperform importance sampling in most scenarios. These findings suggest that error resulting from random sampling of alternatives is less of a problem for logit-based microsimulation models than previously assumed.
Read full abstract