Abstract

Research ObjectiveAddressing social determinant of health such as homelessness is increasingly the focus of efforts to reduce costs and improve health outcomes. Yet, such efforts face significant barriers including how to identify, target, and evaluate homeless populations for better care using administrative data. Some studies have used diagnosis codes or addresses but the sensitivity and specificity of these methods cannot be verified. We developed an algorithm to identify homeless beneficiaries from Medicaid enrollment and claims data and publically available data using machine learning techniques. The algorithm was trained using homeless status data from the California Section 115 Medicaid waiver demonstration Whole Person Care (WPC) Pilot program.Study DesignWPC Pilots determined enrollees' homelessness status using a variety of methods, including data from established homeless information management systems, standardized assessment tools, and individual self‐report. Using Medicaid monthly enrollment and claims data, we constructed multiple indicators for homelessness based on International Classification of Diseases (ICD) diagnosis codes, place of service, and residential addresses (identified addresses that were invalid, had specific keywords such as homeless or bridge, or that were places of service rather than residential). We developed an algorithm to predict homelessness using these indicators along with zip code‐level socioeconomic variables (eg, unemployment rate), enrollment data on demographic variables (eg, age, gender, race/ethnicity), claims data health status (eg, Chronic Illness and Disability Payment System (CDPS) scores, number of chronic diseases), and health care utilization variables (eg, emergency room visits, hospitalizations). We used supervised machine learning techniques that included logistic regression, partial least squares, multivariate adaptive regression splines (MARS), and random forests.Population StudiedA total of 95 171 patients were enrolled in the WPC program with complete demographic variables, among which 40% were reported as homeless by the Pilots.Principal FindingsWe chose the random forest algorithm due to its superior performance over other techniques. The algorithm identified 25 350 enrollees as homeless and provided a predictive accuracy of 88.75%, sensitivity of 81.45%, and specificity of 93.66%. This algorithm performed better than a simple logistic regression (sensitivity = 79.57%) and solely relying on ICD codes (sensitivity = 25.55%). When comparing to enrollees that were identified as homeless by both the Pilots and our algorithm, those identified as homeless by Pilots, but not our algorithm (n = 4969), were younger, more often female, black, Latino and from Los Angeles, Contra Costa, and Alameda counties and those identified as homeless by our algorithm, but not Pilots (n = 2526) were younger and more often male, Latino and from Los Angeles county.ConclusionsOur methods were relatively accurate in identifying homelessness using available administrative data and other public data sources. The discordance between our algorithm and Pilots' homelessness status was likely because of variations in Pilots' approaches, ability to enroll homeless beneficiaries, and use of other available information in medical records or in‐person contacts.Implications for Policy or PracticeThe ability to identify these patients with administrative and public data is critical to the success of Medicaid and other programs in management of the needs of these patients, providing needed services, and monitoring progress of such efforts.Primary Funding SourceCalifornia Department of Health Care Services.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.