Application of Cart-Based Modeling in Motor Insurance Fraud

Rohan Yashraj Gupta,Satya Sai Mudigonda,Phani Krishna Kandala,Pallav Kumar Baruah

doi:10.1201/9781003187059-17

Abstract

Artificial intelligence (AI) provides computer systems with the ability to automatically improve and learn from the data provided without programming it explicitly. The process of “learning” starts from the observation of data to look for patterns in the data provided. Machine learning (ML) can be categorized into supervised, unsupervised and semi-supervised learning methods. A supervised learning method is where an algorithm is used to learn a function to map the input variable (X) to the output variable. The aim here is to find a function so good that when new data (X) is set as an input, 184the model should be able to predict the output (Y) for the given data. Here, the training dataset has input data (X) which has a known corresponding output (Y). The unsupervised learning method is where for a given data its associated properties are studied. However, there are cases wherein for a given input data the response or output is not known in its entirety. Such problems are addressed using semi-supervised learning methods. Classification and regression trees (CART) based model falls under the category of supervised learning method. When the problem at hand has a discrete output variable, it is called a classification problem e.g. “fraud” or “not fraud” and when it has a continuous output variable, it is called a regression problem e.g. “temperature” or “volume”. Gradient boosting method (GBM) is one of the various ML technique which is used for classification and regression problems. Boosting is an iterative method that aims at combining many weak predictions into one powerful one. The application of such machine learning models in the insurance industry is being explored by researchers all over the world. Some potential areas where machine learning can be used in the insurance industry include underwriting and claim analytics, product pricing, claims handling, fraud detection, sales and customer experience, etc. Fraud detection in insurance using machine learning techniques is one area where a lot of research is being done. The list of some of the widely used approaches which were identified from 27 of 450 research articles and studies, by the research team in the Society of Actuaries include data mining, statistical analysis, regression, stratified sample, Monte Carlo simulation, random sampling. Most of the methods identified are machine learning methods. This indicates the fact that how extensively machine learning methods are being used in the area of fraud detection. The ability to detect fraud has various social relevance and benefits. Reduced fraud would decrease the losses faced by the insurer. This would directly or indirectly reduce the premiums charged to policyholders and increase confidence in the financial system.

Full Text