Abstract
In the United States of America where there is no national health care, All-Payer Claims Databases provide great resources to investigate and address disparities in access to, utilization, and outcomes of care. Race/ethnicity being missing, however, is a bottleneck on its usage. In most health claim databases Race/ethnicity only observed to 3-5% of the observations, causing a great missing data problem. We try to recover race/ethnicity information for incomplete observations based on studies of the (3%) complete observations. To emulate the data structure, an analysis of birth records from Connecticut is done where the race/ethnicity information is complete, in order to assess competing models performances. While the Connecticut-based full model based on logistic model proposed achieves over 80% prediction accuracy, we are interested in comparing this model performance to more complex machine learning methods and evaluate prediction. An empirical study is presented.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have