Medicare fraud detection using neural networks

Justin M Johnson,Taghi M Khoshgoftaar

doi:10.1186/s40537-019-0225-0

Abstract

Access to affordable healthcare is a nationwide concern that impacts a large majority of the United States population. Medicare is a Federal Government healthcare program that provides affordable health insurance to the elderly population and individuals with select disabilities. Unfortunately, there is a significant amount of fraud, waste, and abuse within the Medicare system that costs taxpayers billions of dollars and puts beneficiaries’ health and welfare at risk. Previous work has shown that publicly available Medicare claims data can be leveraged to construct machine learning models capable of automating fraud detection, but challenges associated with class-imbalanced big data hinder performance. With a minority class size of 0.03% and an opportunity to improve existing results, we use the Medicare fraud detection task to compare six deep learning methods designed to address the class imbalance problem. Data-level techniques used in this study include random over-sampling (ROS), random under-sampling (RUS), and a hybrid ROS–RUS. The algorithm-level techniques evaluated include a cost-sensitive loss function, the Focal Loss, and the Mean False Error Loss. A range of class ratios are tested by varying sample rates and desirable class-wise performance is achieved by identifying optimal decision thresholds for each model. Neural networks are evaluated on a 20% holdout test set, and results are reported using the area under the receiver operating characteristic curve (AUC). Results show that ROS and ROS–RUS perform significantly better than baseline and algorithm-level methods with average AUC scores of 0.8505 and 0.8509, while ROS–RUS maximizes efficiency with a 4× speedup in training time. Plain RUS outperforms baseline methods with up to 30× improvements in training time, and all algorithm-level methods are found to produce more stable decision boundaries than baseline methods. Thresholding results suggest that the decision threshold always be optimized using a validation set, as we observe a strong linear relationship between the minority class size and the optimal threshold. To the best of our knowledge, this is the first study to compare multiple data-level and algorithm-level deep learning methods across a range of class distributions. Additional contributions include a unique analysis of the relationship between minority class size and optimal decision threshold and state-of-the-art performance on the given Medicare fraud detection task.

Highlights

Medicare is a United States (U.S.) healthcare program established and funded by the Federal Government that provides affordable health insurance to individuals 65 years and older, and other select individuals with permanent disabilities [1]
The combined and Part B data sets scored the best on receiver operating characteristics (ROC) area under the curve (AUC), and the logistic regression (LR) learner was shown to perform significantly better than Gradient Boosted Tree (GBT) and random forest (RF) with a maximum ROC AUC score 0.816
Since the true positive rate (TPR) and true negative rate (TNR) scores are each derived from just one class, i.e. the positive or negative class, they are insensitive to class imbalance

Summary

Introduction

Medicare is a United States (U.S.) healthcare program established and funded by the Federal Government that provides affordable health insurance to individuals 65 years and older, and other select individuals with permanent disabilities [1]. There are many factors that drive the costs of healthcare and health insurance, including fraud, waste, and abuse (FWA) within the healthcare system. The Federal Bureau of Investigation (FBI) estimates that fraud accounts for 3–10% of all billings [4], and the Coalition Against Insurance Fraud [5] estimates that fraud costs all lines of insurance roughly $80 billion per year. Based on these estimates, Medicare is losing between $21 and $71 billion per year to FWA. Federal laws are in place to govern Medicare fraud and abuse, for example the False Claims Act (FCA) and Anti-Kickback Statute [6]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Big Data	Publication Date: Jul 18, 2019
Citations: 71	License type: open-access

R Discovery Prime

R Discovery Prime

Medicare fraud detection using neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Similar Papers

Deep Learning and Data Sampling with Imbalanced Big Data
Justin M Johnson ... Taghi M Khoshgoftaar
-
Justin M Johnson, et. al.Justin M Johnson ... Taghi M Khoshgoftaar
01 Jul 2019
01 Jul 2019

Deep Learning and Thresholding with Class-Imbalanced Big Data
Justin M Johnson ... Taghi M Khoshgoftaar
-
Justin M Johnson, et. al.Justin M Johnson ... Taghi M Khoshgoftaar
01 Dec 2019
01 Dec 2019

Imbalance Learning and Its Application on Medical Datasets
Yachao Shao
-
Yachao ShaoYachao Shao
21 Feb 2022
21 Feb 2022

An New Optimal Decision Threshold Criterion for Broadband-Based Energy Detection with Performance Constraints
Wen Wen Liu ... Tao Peng
Advanced Materials Research | VOL. 765-767
Wen Wen Liu, et. al.Wen Wen Liu ... Tao Peng
01 Sep 2013
Advanced Materials Research | VOL. 765-767

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Medicare fraud detection using neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data