Abstract

SummaryThis study aims to compare the performances of logistic regression and random forest classifiers in a balanced oversampling procedure for the prediction of households that will face catastrophic out‐of‐pocket (OOP) health expenditure. Data were derived from the nationally representative household budget survey collected by the Turkish Statistical Institute for the year 2012. A total of 9,987 households returned valid surveys. The data set was highly imbalanced, and the percentage of households facing catastrophic OOP health expenditure was 0.14. Balanced oversampling was performed, and 30 artificial data sets were generated with sizes of 5% and 98% of the original data size. The balanced oversampled data set provided accurate predictions, and random forest exhibited superior performance in identifying households facing catastrophic OOP health expenditure (area under the receiver operating characteristic curve, AUC = 0.8765; classification accuracy, CA = 0.7936; sensitivity = 0.7765; specificity = 0.8552; F1 = 0.7797).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call