Abstract

In the United States, advances in technology and medical sciences continue to improve the general well-being of the population. With this continued progress, programs such as Medicare are needed to help manage the high costs associated with quality healthcare. Unfortunately, there are individuals who commit fraud for nefarious reasons and personal gain, limiting Medicare’s ability to effectively provide for the healthcare needs of the elderly and other qualifying people. To minimize fraudulent activities, the Centers for Medicare and Medicaid Services (CMS) released a number of “Big Data” datasets for different parts of the Medicare program. In this paper, we focus on the detection of Medicare fraud using the following CMS datasets: (1) Medicare Provider Utilization and Payment Data: Physician and Other Supplier (Part B), (2) Medicare Provider Utilization and Payment Data: Part D Prescriber (Part D), and (3) Medicare Provider Utilization and Payment Data: Referring Durable Medical Equipment, Prosthetics, Orthotics and Supplies (DMEPOS). Additionally, we create a fourth dataset which is a combination of the three primary datasets. We discuss data processing for all four datasets and the mapping of real-world provider fraud labels using the List of Excluded Individuals and Entities (LEIE) from the Office of the Inspector General. Our exploratory analysis on Medicare fraud detection involves building and assessing three learners on each dataset. Based on the Area under the Receiver Operating Characteristic (ROC) Curve performance metric, our results show that the Combined dataset with the Logistic Regression (LR) learner yielded the best overall score at 0.816, closely followed by the Part B dataset with LR at 0.805. Overall, the Combined and Part B datasets produced the best fraud detection performance with no statistical difference between these datasets, over all the learners. Therefore, based on our results and the assumption that there is no way to know within which part of Medicare a physician will commit fraud, we suggest using the Combined dataset for detecting fraudulent behavior when a physician has submitted payments through any or all Medicare parts evaluated in our study.

Highlights

  • Healthcare in the United States (U.S.) is important in the lives of many citizens, but the high costs of health-related services leave many patients with limited medical care

  • The Combined dataset has the best overall Area under the ROC Curve (AUC), but the Part B dataset shows the lowest variation in fraud detection performance across learners, which includes having the highest AUC scores for Gradient Boosted Trees (GBT) and Random Forest (RF)

  • Medicare is necessary for many citizens, and the importance placed on quality research into fraud detection to keep healthcare costs fair and reasonable

Read more

Summary

Introduction

Healthcare in the United States (U.S.) is important in the lives of many citizens, but the high costs of health-related services leave many patients with limited medical care. There are a number of issues facing healthcare and Herland et al J Big Data (2018) 5:29 medical insurance systems, such as a growing population or bad actors (i.e. fraudulent or potentially fraudulent physicians/providers), which reduces allocated funds for these programs. Medicare accounts for 20% of all U.S healthcare spending [8] with a total possible cost recovery (with the potential application of effective fraud detection methods) of $3.8 to $13 billion from Medicare alone. We focus on data within the Fee-For-Service system of Medicare where the basic claims process consists of a physician (or other healthcare provider) performing one or more procedures and submitting a claim to Medicare for payment, rather than directly billing the patient. Additional information on the Medicare process and Medicare fraud is provided within [1, 11,12,13]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call