Improving Explainability of Major Risk Factors in Artificial Neural Networks for Auto Insurance Rate Regulation

Shengkun Xie

doi:10.3390/risks9070126

Abstract

In insurance rate-making, the use of statistical machine learning techniques such as artificial neural networks (ANN) is an emerging approach, and many insurance companies have been using them for pricing. However, due to the complexity of model specification and its implementation, model explainability may be essential to meet insurance pricing transparency for rate regulation purposes. This requirement may imply the need for estimating or evaluating the variable importance when complicated models are used. Furthermore, from both rate-making and rate-regulation perspectives, it is critical to investigate the impact of major risk factors on the response variables, such as claim frequency or claim severity. In this work, we consider the modelling problems of how claim counts, claim amounts and average loss per claim are related to major risk factors. ANN models are applied to meet this goal, and variable importance is measured to improve the model’s explainability due to the models’ complex nature. The results obtained from different variable importance measurements are compared, and dominant risk factors are identified. The contribution of this work is in making advanced mathematical models possible for applications in auto insurance rate regulation. This study focuses on analyzing major risks only, but the proposed method can be applied to more general insurance pricing problems when additional risk factors are being considered. In addition, the proposed methodology is useful for other business applications where statistical machine learning techniques are used.

Highlights

Unlike insurance pricing, which mainly targets the study of risk factors that significantly impact the calculation of insurance prices, we focus on major risk factors for insurance rate regulation purposes
We have three different response variables: claim counts, claim amounts, and the average loss per claim count. All observations from those three response variables are transformed using the logarithmic function to improve the fitted model’s variance stability. (Note that, in rate regulation, we deal with aggregate loss, in which we do not have zero loss problems, unlike the case for individual loss.) For input variables, there is accident year (AY), reporting year (RY), log-scale of the upper limit of
We can see that the artificial neural networks (ANN)-c(4) with the average loss per claim count outperforms the ANN-c(4) with the claim amount. This result was compared to generalized linear models (GLM) results, where the selected one are reported in Figure 9, and we found that ANN-c(4) outperforms the GLM

Summary

Introduction

If insurance rates are not regulated, the merit of predictive modelling is still apparent as its use in pricing helps to avoid the adverse selection of insurance policies (Dionne et al 1999). The use of machine learning techniques such as artificial neural networks (ANN) has been an emerging approach for insurance pricing. They can often achieve a high level of model prediction accuracy (Fialova and Folvarcna 2020; Gao and Wüthrich 2018; lseri and Karlık 2009; Sun et al 2017; Wuthrich 2019; Yeo et al 2001).

Objectives

Methods

Results

Discussion

Conclusion