Abstract
Homelessness is poorly captured in most administrative data sets making it difficult to understand how, when, and where this population can be better served. This study sought to develop and validate a classification model of homelessness. Our sample included 5,050,639 individuals aged 11 years and older who were included in a linked dataset of administrative records from multiple state-maintained databases in Massachusetts for the period from 2011-2015. We used logistic regression to develop a classification model with 94 predictors and subsequently tested its performance. The model had high specificity (95.4%), moderate sensitivity (77.8%) for predicting known cases of homelessness, and excellent classification properties (area under the receiver operating curve 0.94; balanced accuracy 86.4%). To demonstrate the potential opportunity that exists for using such a modeling approach to target interventions to mitigate the risk of an adverse health outcome, we also estimated the association between model predicted homeless status and fatal opioid overdoses, finding that model predicted homeless status was associated with a nearly 23-fold increase in the risk of fatal opioid overdose. This study provides a novel approach for identifying homelessness using integrated administrative data. The strong performance of our model underscores the potential value of linking data from multiple service systems to improve the identification of housing instability and to assist government in developing programs that seek to improve health and other outcomes for homeless individuals.
Highlights
Homelessness is associated with a wide range of adverse social, economic and health outcomes [1,2,3]
Based on ICD codes the number of individuals identified as homeless in each of the datasets used to construct this measure were as follows: 23,239 individuals in the All-Payer Claims Database (APCD) dataset, 21,722 in the CaseMix dataset, 300 in the DMH dataset, 3,237 in the MATRIS dataset, and 6,704 were identified based on the Prescription Monitoring Program (PMP)
Applying the parameters of the classification model estimated on the downsampled development sample to the validation sample yielded an AUC of 0.94, which is in the excellent range
Summary
Homelessness is associated with a wide range of adverse social, economic and health outcomes [1,2,3]. Many service systems do not capture information about housing status in a reliable manner, despite the potential importance of such information for tailoring service delivery to those experiencing housing instability Recognition of this shortcoming has led to increased interest in developing predictive models to identify persons experiencing homelessness using available data in administrative records. Much of this work has been conducted in health care systems where studies have used indicators obtained from medical records, including diagnosis codes [8], address information [9, 10], and free text notes [11,12,13], to develop models identifying persons experiencing homelessness These studies are limited by their exclusive reliance on data obtained from medical records and are based on a limited set of predictor variables and apply to non-representative samples of individuals
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.