Abstract

Access to affordable healthcare is a nationwide concern that impacts a large majority of the United States population. Medicare is a Federal Government healthcare program that provides affordable health insurance to the elderly population and individuals with select disabilities. Unfortunately, there is a significant amount of fraud, waste, and abuse within the Medicare system that costs taxpayers billions of dollars and puts beneficiaries’ health and welfare at risk. Previous work has shown that publicly available Medicare claims data can be leveraged to construct machine learning models capable of automating fraud detection, but challenges associated with class-imbalanced big data hinder performance. With a minority class size of 0.03% and an opportunity to improve existing results, we use the Medicare fraud detection task to compare six deep learning methods designed to address the class imbalance problem. Data-level techniques used in this study include random over-sampling (ROS), random under-sampling (RUS), and a hybrid ROS–RUS. The algorithm-level techniques evaluated include a cost-sensitive loss function, the Focal Loss, and the Mean False Error Loss. A range of class ratios are tested by varying sample rates and desirable class-wise performance is achieved by identifying optimal decision thresholds for each model. Neural networks are evaluated on a 20% holdout test set, and results are reported using the area under the receiver operating characteristic curve (AUC). Results show that ROS and ROS–RUS perform significantly better than baseline and algorithm-level methods with average AUC scores of 0.8505 and 0.8509, while ROS–RUS maximizes efficiency with a 4× speedup in training time. Plain RUS outperforms baseline methods with up to 30× improvements in training time, and all algorithm-level methods are found to produce more stable decision boundaries than baseline methods. Thresholding results suggest that the decision threshold always be optimized using a validation set, as we observe a strong linear relationship between the minority class size and the optimal threshold. To the best of our knowledge, this is the first study to compare multiple data-level and algorithm-level deep learning methods across a range of class distributions. Additional contributions include a unique analysis of the relationship between minority class size and optimal decision threshold and state-of-the-art performance on the given Medicare fraud detection task.

Highlights

  • Medicare is a United States (U.S.) healthcare program established and funded by the Federal Government that provides affordable health insurance to individuals 65 years and older, and other select individuals with permanent disabilities [1]

  • The combined and Part B data sets scored the best on receiver operating characteristics (ROC) area under the curve (AUC), and the logistic regression (LR) learner was shown to perform significantly better than Gradient Boosted Tree (GBT) and random forest (RF) with a maximum ROC AUC score 0.816

  • Since the true positive rate (TPR) and true negative rate (TNR) scores are each derived from just one class, i.e. the positive or negative class, they are insensitive to class imbalance

Read more

Summary

Introduction

Medicare is a United States (U.S.) healthcare program established and funded by the Federal Government that provides affordable health insurance to individuals 65 years and older, and other select individuals with permanent disabilities [1]. There are many factors that drive the costs of healthcare and health insurance, including fraud, waste, and abuse (FWA) within the healthcare system. The Federal Bureau of Investigation (FBI) estimates that fraud accounts for 3–10% of all billings [4], and the Coalition Against Insurance Fraud [5] estimates that fraud costs all lines of insurance roughly $80 billion per year. Based on these estimates, Medicare is losing between $21 and $71 billion per year to FWA. Federal laws are in place to govern Medicare fraud and abuse, for example the False Claims Act (FCA) and Anti-Kickback Statute [6]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.