Abstract

The real-world complex systems of transportation and insurance constantly produce massive data, and the number of variables used to capture these systems can sometimes be overwhelming. Predictive modelling of such high-dimensional data often comes with the increased complexity of models, leading to low model interpretability and performance. Therefore, balancing the model’s interpretability and prediction accuracy is crucial. Keeping a balance between these two aspects significantly improves prediction reliability. This will ensure that complex systems’ overall control and management can be statistically optimal. However, measuring variable importance to reduce the dimensionality of given problems may be challenging due to different data sources, methods employed, or both. This paper proposes a novel approach to variable selection based on a comprehensive variable importance measure. The proposed method formulates variable selection as a multi-criteria decision analysis (MCDA) problem and uses TOPSIS to solve it. It offers a systematic approach to selecting variables to construct more interpretable predictive models. Our contribution is a thorough investigation using simulations with known model characteristics and a rigorous test of the method’s robustness by analysing different datasets with different noise scenarios. To demonstrate the effectiveness of our approach, we apply it to the variable selection problem for the national collision database. We identify the most significant variables that may significantly influence the prediction of fatal accident rates.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.