Abstract

Roughly 10 percent of the insurance industry’s incurred losses are estimated to stem from fraudulent claims. One solution is to use tabular data to construct models that can distinguish between claims that are legitimate and those that are fraudulent. However, while canonical tabular data models enable robust fraud detection, complex sequential data have been out of the insurance industry’s scope. For health insurance, we propose deep learning architectures that process insurance data consisting of sequential records of patient visits and characteristics. Both the sequential and tabular components improve the quality of the model, generating new insights into the detection of health insurance fraud. Empirical results derived using relevant data from a health insurance company show that our approach outperforms state-of-the-art models and can substantially improve the claims management process. We obtain a ROC AUC metric of 0.873, while the best competitor based on state-of-the-art models achieves 0.815. Moreover, we demonstrate that our architectures are more robust to data corruption. As more and more semi-structured event sequence data become available to insurers, our methods will be valuable for many similar applications, particularly when variables have a large number of categories, such as those from the International Classification of Disease (ICD) codes or other classification codes.

Highlights

  • Fraud causes substantial costs and losses for the finance and insurance industries

  • We propose architectures for categorical sequence embeddings via deep learning that help improve the classification of fraudulent and valid claims compared to other machine learning methods

  • RESULTS we evaluate the metrics of our classic machine learning and deep learning models

Read more

Summary

Introduction

Fraud causes substantial costs and losses for the finance and insurance industries. Examples include fraudulent credit card transactions and insurance fraud. Fraud detection is a critical function and core competence in these industries and their claims management processes. The proliferation of digitization in finance and insurance has led to big datasets suited to fraud detection. We propose architectures for categorical sequence embeddings via deep learning that help improve the classification of fraudulent and valid claims compared to other machine learning methods. We review different ways to cope with class imbalance problem that is typical for fraud detection. Researchers from many disciplines investigate it for application domains, including time-series modeling, [2], [3], predictive maintenance of technical systems, [4], [5], and applications in the finance and insurance industries, [6]

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.