Impact of Density of Lab Data in EHR for Prediction of Potentially Preventable Events

Chandrima Sarkar,Jaideep Srivastava

doi:10.1109/ichi.2013.82

Abstract

This paper presents an analysis of sparse and incomplete Electronic Health Record (EHR) data for the prediction of patients with the risk of Potentially Preventable Events (PPEs). PPEs are admissions, readmissions, complications and emergency department visits that could have been avoided if the patient had been given the appropriate interventions. Machine learning techniques have made the task of PPE detection less difficult. However, it is still a challenging task due to the sparse and incomplete nature of the EHR data. It is therefore important to investigate the factors that impact the prediction of PPE in EHR data. In this paper we define the term density for evaluating sparse and incomplete nature of the EHR data set. We analyze three important factors that impacts PPE prediction in sparse and incomplete EHR data. These factors are - 1) Effect of varying domain information in the patient records on PPE prediction, 2) Impact of a popular data mining pre-processing technique known as rank aggregation based feature selection on PPE prediction, and 3) Effect of ensemble classification on prediction of PPE. The results of the analysis indicate that the rank aggregation based feature selection technique and ensemble classification improves classification accuracy by approximately 3-4\% despite of the sparse and incomplete nature of the data. However, eliminating patient records with less domain information, in order to reduce incompleteness in the data, does not cause an enhancement in the classification accuracy. We conclude that feature selection and ensemble classification techniques are important factors that affect classification accuracy even in sparse and incomplete data sets. We conclude as well that randomly decreasing domain information by varying lab values does not assist in increasing accuracy for the prediction of PPE.

Full Text