The random parameters Generalized Linear Model (GLM) is frequently used to model speeding characteristics and capture the heterogenous effects of factors. However, this statistical approach is seldom employed for prediction and generalization due to the challenge of transferring its predefined errors. Recently, the emergence of explainable AI techniques has illuminated a new path for analyzing factors associated with risky driving behaviors. Despite this, there remains a gap that comparing results from machine and deep learning (ML/DL) approaches with those from random parameters GLM. This study aims to apply the random parameter GLM and explainable deep learning to evaluate the heterogenous effects of factors on the taxis’ high-range speeding likelihood. Initially, a Beta GLM with random parameters (BGLM-RP) is developed to model the high-range speeding likelihood among taxi drivers. Additionally, XGBoost, a simple convolutional neural network (Simple-CNN), a deeper CNN (DCNN), and a deeper CNN with self-attention (DCNN-SA) are developed. The quantified explanations and illustrations of the factors’ heterogenous effects from ML/DL models are derived from pseudo coefficients by decomposing factors’ SHapley Additive exPlanations (SHAP) values. All the developed statistical, ML, and DL models are compared in terms of mean absolute errors and mean square errors on testing and full data. Results show that DCNN-SA excels in prediction on testing data, indicating its superior generalization capabilities, while BGLM-RP outperforms other models on full data. The DCNN-SA can reveal the heterogenous effects of factors for both in-sample and out-of-sample data, which is not possible for the random parameter GLM. However, BGLM-RP can reveal larger magnitudes of the factors’ heterogenous effects for in-sample data. The signs and significances are identical between the varying coefficients from BGLM-RP and the pseudo coefficients from the ML/DL models, demonstrating the validity and rationale of using the proposed explanation framework to quantify the factors’ effects in ML/DL models. The study also discusses the contributions of various factors to the high-range speeding likelihood of taxi drivers.
Read full abstract