Abstract

Insurance fraud is ranked second in the list of expensive crimes in the United States, with healthcare fraud being the second highest amongst all insurance fraud. Contrary to the popular belief, insurance fraud is not a victimless crime. The cost of crime is passed onto law-abiding citizens in the form of increased premiums or serious harm or danger to beneficiaries. To combat this kind of societal threat, there is an intense need for healthcare fraud detection systems to evolve. Some common roadblocks in implementing digital advancements (as seen in other domains) to healthcare are the complexity, heterogeneity of the data systems, and varied health program models across the United States. In other words, data are not stored in a centralized manner due to the sensitive domain nature, thus making it difficult to implement a robust real-world fraud-detection system. At the same time, in addition to the complexity of the varied systems involved, there is also the need to meet certain standards before a fraud actor can be prosecuted in a litigation setting. Thus, there is a human aspect to the fraud detection process flow in the real-world.In this article, a novel framework was outlined that converts diverse prescription claims (both fee-for-service and managed care) into a set of input variables/features suitable for implementation of an advanced statistical modeling fraud framework. This article thus aims to contribute to the existing literature by describing a process to transform prescription claims data to secondary features specific to provider fraud detection. The core idea was to focus on three main aspects of fraud (business heuristics on claims, provider-to-prescriber relations, and provider’s client populations) to design the input features. A systematic method was proposed to extract features that have the potential to detect billing or behavioral outliers among pharmacy providers using information extracted from a secondary database (outpatient prescriptions). The application of a commonly used dimensionality reduction method, the Principal Component Analysis (PCA), was evaluated. PCA evaluates and reduces the extensive feature subspace to only those that captures the most variance in the data. To evaluate the features extracted from this framework, the application of the engineered features and the principal components to out-of-the-box logistic regression and Random Forest algorithms were considered to identify potential fraud. The engineered features when tested in different experimental settings with a logistic regression model had the highest area under the Receiver Operating Characteristic (ROC) curve of 0.76 and a weighted F score of 0.85 while a random forest model had the highest area under curve of 0.74 and a weighted F score of 0.88.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call