Abstract

Default in premium payments impacts significantly on the profitability of the insurance company. Therefore, predicting defaults in advance is very important for insurance companies. Predicting in the insurance sector is one of the most beneficial and important study areas in today's world, thanks to technological advancements. But because of the imbalanced datasets in this industry, predicting insurance premium defaulting becomes a difficult task. Moreover, there is no study that applies and compares different SMOTE family approaches to address the issue of imbalanced data. So, this study aims to compare different SMOTE family approaches. Such as Synthetic Minority Oversampling Technique (MOTE), Safe-level SMOTE (SLS), Relocating Safe-level SMOTE (RSLS), Density-based SMOTE (DBSMOTE), Borderline-SMOTE(BLSMOTE), Adaptive Synthetic Sampling (ADSYN), and Adaptive Neighbor Synthetic (ASN), SMOTE-Tomek, and SMOTE-ENN, to solve the problem of unbalanced data. This study applied a variety of machine learning (ML)classifiers to assess the performance of the SMOTE family in addressing the imbalanced problem. These classifiers including Logistic Regression (LR), CART, C4.5, C5.0, Support Vector Machine (SVM), Random Forest (RF), Bagged CART(BC), AdaBoost (ADA), Stochastic Gradient Boosting, (SGB), XGBOOST(XGB), NAIVE BAYES, (NB), k-Nearest Neighbors (K-NN), and Neural Networks (NN). Additionally, model validation strategies include Random hold-out. The findings obtained using various assessment measures show that ML algorithms do not perform well with imbalanced data, indicating that the problem of imbalanced data must be addressed. On the other hand, using balanced datasets created by SMOTE family techniques improves the performance of classifiers. Moreover, the Friedman test, a statistical significance test, further confirms that the hybrid SMOTE family methods are better than others, especially the SMOTE -TOMEK, which performs better than other resampling approaches. Moreover, among ML algorithms, the SVM model has produced the best results with the SMOTE- TOMEK.

Highlights

  • In the era of the industrial revolution, all businesses seek digital transformation

  • The results showed that the Random Forest outperforms the other two algorithms on the Insurance claim dataset

  • The most important outcomes are from Table IV; there is a substantial discrepancy between specificity and sensitivity with the unbalanced data

Read more

Summary

Introduction

In the era of the industrial revolution, all businesses seek digital transformation. One of the key elements of digital transformation is your ability to manage data. Data Science and business analytics is the tool that is being employed on the holy grail of data to extract hidden insights. Since the amount of data is exponentially increasing, the systematic process of data science is gaining popularity in recent times. 'THE INSURANCE' industry is no exception, and it is one of the key areas where data science is being practiced at a large scale. Many insurance companies are employing ML techniques that provide a more systematic way of obtaining a more accurate and representative outcome than the traditional statistic approach

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call