Approaches for identifying U.S. medicare fraud in provider claims data.

Matthew Herland,Taghi M. Khoshgoftaar,Richard A. Bauder

doi:10.1007/s10729-018-9460-8

Abstract

Quality and affordable healthcare is an important aspect in people's lives, particularly as they age. The rising elderly population in the United States (U.S.), with increasing number of chronic diseases, implies continuing healthcare later in life and the need for programs, such as U.S. Medicare, to help with associated medical expenses. Unfortunately, due to healthcare fraud, these programs are being adversely affected draining resources and reducing quality and accessibility of necessary healthcare services. The detection of fraud is critical in being able to identify and, subsequently, stop these perpetrators. The application of machine learning methods and data mining strategies can be leveraged to improve current fraud detection processes and reduce the resources needed to find and investigate possible fraudulent activities. In this paper, we employ an approach to predict a physician's expected specialty based on the type and number of procedures performed. From this approach, we generate a baseline model, comparing Logistic Regression and Multinomial Naive Bayes, in order to test and assess several new approaches to improve the detection of U.S. Medicare Part B provider fraud. Our results indicate that our proposed improvement strategies (specialty grouping, class removal, and class isolation), applied to different medical specialties, have mixed results over the selected Logistic Regression baseline model's fraud detection performance. Through our work, we demonstrate that improvements to current detection methods can be effective in identifying potential fraud.

Full Text