Abstract

We examine a machine learning approach for deriving insights from observational healthcare data in order to improve public health. Our goal is to simultaneously identify patient subpopulations with differing health risks and find the distinct risk factors or determinants associated with each subpopulation. Here, we develop a supervised Gaussian Mixture Model (GMM) approach for subpopulation modeling that combines GMMs with L1-logistic regression. We demonstrate the approach on an analysis of high cost drivers of Medicaid expenditures for inpatient stays associated with Newborn, Pregnancy, and Circulatory Systems diagnostic categories. These conditions were chosen because they had the highest total inpatient expenditures in New York State (NYS) in 2016. When compared with state-of-the-art learning methods (random forests, boosting, deep learning), our approach provides comparable prediction performance but also extracts insightful explanations of the subpopulation structure and risk factors within each subpopulation. Sequentially applying unsupervised learning methods and then applying logistic regression fails to yield equally meaningful results: the unsupervised subpopulations are homogeneous and moderately predictable, while some of our subpopulations are highly predictable with easy-to-identify drivers of cost. Focusing on newborns, we unveil subpopulations indicative of the landscape of healthcare in NYS: about 90% of the discharges are healthy New York City babies and about 1% are costly complex cases. Subpopulations indicate regional disparities: for example newborns from Central, Southern and Western NY are of higher risk for high-cost stays associated with substance abuse. The results indicate the promise of the approach for future population health studies based on electronic health care records.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.