Comparing SMOTE Family Techniques in Predicting Insurance Premium Defaulting using Machine Learning Models

Mohamed Hanafy Kotb,Ruixing Ming

doi:10.14569/ijacsa.2021.0120970

Abstract

Default in premium payments impacts significantly on the profitability of the insurance company. Therefore, predicting defaults in advance is very important for insurance companies. Predicting in the insurance sector is one of the most beneficial and important study areas in today's world, thanks to technological advancements. But because of the imbalanced datasets in this industry, predicting insurance premium defaulting becomes a difficult task. Moreover, there is no study that applies and compares different SMOTE family approaches to address the issue of imbalanced data. So, this study aims to compare different SMOTE family approaches. Such as Synthetic Minority Oversampling Technique (MOTE), Safe-level SMOTE (SLS), Relocating Safe-level SMOTE (RSLS), Density-based SMOTE (DBSMOTE), Borderline-SMOTE(BLSMOTE), Adaptive Synthetic Sampling (ADSYN), and Adaptive Neighbor Synthetic (ASN), SMOTE-Tomek, and SMOTE-ENN, to solve the problem of unbalanced data. This study applied a variety of machine learning (ML)classifiers to assess the performance of the SMOTE family in addressing the imbalanced problem. These classifiers including Logistic Regression (LR), CART, C4.5, C5.0, Support Vector Machine (SVM), Random Forest (RF), Bagged CART(BC), AdaBoost (ADA), Stochastic Gradient Boosting, (SGB), XGBOOST(XGB), NAIVE BAYES, (NB), k-Nearest Neighbors (K-NN), and Neural Networks (NN). Additionally, model validation strategies include Random hold-out. The findings obtained using various assessment measures show that ML algorithms do not perform well with imbalanced data, indicating that the problem of imbalanced data must be addressed. On the other hand, using balanced datasets created by SMOTE family techniques improves the performance of classifiers. Moreover, the Friedman test, a statistical significance test, further confirms that the hybrid SMOTE family methods are better than others, especially the SMOTE -TOMEK, which performs better than other resampling approaches. Moreover, among ML algorithms, the SVM model has produced the best results with the SMOTE- TOMEK.

Highlights

In the era of the industrial revolution, all businesses seek digital transformation
The results showed that the Random Forest outperforms the other two algorithms on the Insurance claim dataset
The most important outcomes are from Table IV; there is a substantial discrepancy between specificity and sensitivity with the unbalanced data

Summary

Introduction

In the era of the industrial revolution, all businesses seek digital transformation. One of the key elements of digital transformation is your ability to manage data. Data Science and business analytics is the tool that is being employed on the holy grail of data to extract hidden insights. Since the amount of data is exponentially increasing, the systematic process of data science is gaining popularity in recent times. 'THE INSURANCE' industry is no exception, and it is one of the key areas where data science is being practiced at a large scale. Many insurance companies are employing ML techniques that provide a more systematic way of obtaining a more accurate and representative outcome than the traditional statistic approach

Objectives

Methods

Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2021
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

Comparing SMOTE Family Techniques in Predicting Insurance Premium Defaulting using Machine Learning Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage
Jianxiang Tang ... Hongli Wan
BMC Medical Informatics and Decision Making | VOL. 22
Jianxiang Tang, et. al.Jianxiang Tang ... Hongli Wan
25 Oct 2022
BMC Medical Informatics and Decision Making | VOL. 22

Machine Learning-Based Predictive Models of Behavioral and Psychological Symptoms of Dementia
Eunhee Cho ... Byoung Seok Ye
Innovation in Aging | VOL. 5
Eunhee Cho, et. al.Eunhee Cho ... Byoung Seok Ye
17 Dec 2021
Innovation in Aging | VOL. 5

Applying machine learning methods to predict geology using soil sample geochemistry
Timothy C.C Lui ... Sharon A Cowling
Applied Computing and Geosciences | VOL. 16
Timothy C.C Lui, et. al.Timothy C.C Lui ... Sharon A Cowling
11 Aug 2022
Applied Computing and Geosciences | VOL. 16

An improved hybrid model for cardiovascular disease detection using machine learning in IoT
Arslan Naseer ... Fahim Arif
Expert Systems | VOL. -
Arslan Naseer, et. al.Arslan Naseer ... Fahim Arif
19 Dec 2023
Expert Systems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparing SMOTE Family Techniques in Predicting Insurance Premium Defaulting using Machine Learning Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications