Performance evaluation of mode choice models under balanced and imbalanced data assumptions

Shahrbanoo Rezaei,Anahita Khojandi,Antora Mohsena Haque,Candace Brakewood,Mingzhou Jin,Christopher Cherry

doi:10.1080/19427867.2021.1955567

Abstract

ABSTRACT One common limitation faced in mode choice modeling is data imbalance. Mode choice models, such as logit models, may output biased estimations for alternatives with smaller shares and consequently have high prediction errors. Since accurate prediction of the less commonly used modes is important in some applications, such as predicting transit mode share in many auto-oriented American cities, it is essential to improve the prediction capability of logit models for those modes. Hence, this study applies an imbalanced learning technique and evaluates the prediction capability and interpretability of logit models under both balanced and imbalanced datasets using a case study for the City of Nashville, Tennessee. The results show that the proposed method improves the accuracy of the less commonly used modes and the mean absolute percentage error by 18% and 2%, respectively, while keeping the models interpretable. Finally, we provide some high-level guidelines for mode choice modeling with imbalanced data.

Full Text