Abstract
Disease and symptom diagnostic codes are a valuable resource for classifying and predicting patient outcomes. In this paper, we propose a novel methodology for utilizing disease diagnostic information in a predictive machine learning framework. Our methodology relies on a novel, clustering-based feature extraction framework using disease diagnostic information. To reduce the data dimensionality, we identify disease clusters using co-occurrence statistics. We optimize the number of generated clusters in the training set and then utilize these clusters as features to predict patient severity of condition and patient readmission risk. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million hospital discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 Congestive Heart Failure (CHF) patients and the UCI 130-US diabetes dataset that includes admissions from 69,980 diabetic patients. We compare our cluster-based feature set with the commonly used comorbidity frameworks including Charlson's index, Elixhauser's comorbidities and their variations. The proposed approach was shown to have significant gains between 10.7–22.1% in predictive accuracy for CHF severity of condition prediction and 4.65–5.75% in diabetes readmission prediction.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.