Abstract

For calculating non-life insurance premiums, actuaries traditionally rely on separate severity and frequency models using covariates to explain the claims loss exposure. In this paper, we focus on the claim severity. First, we build two reference models, a generalized linear model and a generalized additive model, relying on a log-normal distribution of the severity and including the most significant factors. Thereby, we relate the continuous variables to the response in a nonlinear way. In the second step, we tune two random forest models, one for the claim severity and one for the log-transformed claim severity, where the latter requires a transformation of the predicted results. We compare the prediction performance of the different models using the relative error, the root mean squared error and the goodness-of-lift statistics in combination with goodness-of-fit statistics. In our application, we rely on a dataset of a Swiss collision insurance portfolio covering the loss exposure of the period from 2011 to 2015, and including observations from 81 309 settled claims with a total amount of CHF 184 mio. In the analysis, we use the data from 2011 to 2014 for training and from 2015 for testing. Our results indicate that the use of a log-normal transformation of the severity is not leading to performance gains with random forests. However, random forests with a log-normal transformation are the favorite choice for explaining right-skewed claims. Finally, when considering all indicators, we conclude that the generalized additive model has the best overall performance.

Highlights

  • This paper compares the claim severity modeling and the predictions of generalized additive models (GAM), generalized linear models (GLM), and random forests (RF) models when applied on the same car collision dataset from a Swiss insurer

  • The traditional regression models rely on a log-normal distribution implying an exponential back-transformation of the predictions

  • We compare the performance of GAM, GLM, and RF models on a test sample by taking several perspectives considering individual and total errors, violin plots, and comparing the model predictions along selected profiles

Read more

Summary

Introduction

For calculating non-life insurance premiums, actuaries traditionally rely on separate severity and frequency models using covariates to explain the claims loss exposure. We build two reference models, a generalized linear model and a generalized additive model, relying on a log-normal distribution of the severity and including the most significant factors. Our results indicate that the use of a log-normal transformation of the severity is not leading to performance gains with random forests. Random forests with a log-normal transformation are the favorite choice for explaining right-skewed claims. Actuaries rely on linear regression models to calculate the premiums. Such models used explanatory variables, including the characteristics of the policyholder, of the risk insured, and of the contract configuration.

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call