Abstract

A boosting-based machine learning algorithm is presented to model a binary response with large imbalance, i.e., a rare event. The new method (i) reduces the prediction error of the rare class, and (ii) approximates an econometric model that allows interpretability. RiskLogitboost regression includes a weighting mechanism that oversamples or undersamples observations according to their misclassification likelihood and a generalized least squares bias correction strategy to reduce the prediction error. An illustration using a real French third-party liability motor insurance data set is presented. The results show that RiskLogitboost regression improves the rate of detection of rare events compared to some boosting-based and tree-based algorithms and some existing methods designed to treat imbalanced responses.

Highlights

  • Research on rare events is steadily increasing in real-world applications of risk management

  • Through publicly available data sets in the library CASdatasets in R. It contains 413,169 observations that were recorded mostly in one year about risk factors for third-party liability motor policies. This data set contains the following information about vehicle characteristics: The power of the car ordered by category (Power); the car brand divided into seven categories (Brand); the fuel type, either diesel or regular (Gas)

  • The results provided by the RiskLogitboost regression suggest that the likelihood of a policy holder having an accident increased if they had e, k, l, m, n, o type Power vehicle; in particular, drivers with o–type Power were the most likely to have an accident among all types of Power

Read more

Summary

Introduction

Research on rare events is steadily increasing in real-world applications of risk management. Very few papers in this field have been devoted to studying rare events in binary response such as [25,26,27], and even fewer that go beyond econometric methods, such as [9], which employs advanced machine learning methods. Several machine learning methods are considered as black boxes in terms of interpretation. They are frequently interpreted using single metrics such as classification accuracy as unique descriptions of complex tasks [32], and they are not able to provide robust explanations for high-risk environments.

Background
Boosting Methods
Transformation : e
Penalized Regression Methods
Interpretable Machine Learning
The Rare Event Problem with RiskLogitboost Regression
RiskLogitboost Regression Weighting Mechanism to Improve Rare-Class Learning
Bias Correction with Weights
RiskLogitboost Regression
Illustrative Data
Discussion of Results
Predictive Performance of Extremes
Interpretable RiskLogitboost Regression
Findings
Conclusions
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call