Abstract

The real-world complex systems of transportation and insurance constantly produce massive data, and the number of variables used to capture these systems can sometimes be overwhelming. Predictive modelling of such high-dimensional data often comes with the increased complexity of models, leading to low model interpretability and performance. Therefore, balancing the model’s interpretability and prediction accuracy is crucial. Keeping a balance between these two aspects significantly improves prediction reliability. This will ensure that complex systems’ overall control and management can be statistically optimal. However, measuring variable importance to reduce the dimensionality of given problems may be challenging due to different data sources, methods employed, or both. This paper proposes a novel approach to variable selection based on a comprehensive variable importance measure. The proposed method formulates variable selection as a multi-criteria decision analysis (MCDA) problem and uses TOPSIS to solve it. It offers a systematic approach to selecting variables to construct more interpretable predictive models. Our contribution is a thorough investigation using simulations with known model characteristics and a rigorous test of the method’s robustness by analysing different datasets with different noise scenarios. To demonstrate the effectiveness of our approach, we apply it to the variable selection problem for the national collision database. We identify the most significant variables that may significantly influence the prediction of fatal accident rates.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call