Prediction of effective sociodemographic variables in modeling health literacy: A machine learning approach

Feyza İnceoğlu,Serdar Deniz,Fatma Hilal Yagin

doi:10.1016/j.ijmedinf.2023.105167

Abstract

IntroductionHealth literacy is becoming a more important concept for the effective use of health systems day by day. The main purpose of the study is to determine the importance levels of the variables by using Machine Learning methods in order to determine the main factors affecting health literacy, and to find the most important variables for health literacy. Material and methods1001 participants with a mean age of 18.05 ± 0.81 standard deviations were included in the study. The European Health Literacy Scale was used to determine the health literacy level of the participants. The scale cut-off point is 25, and 516 (51.5%) of the participants have low health literacy and 485 (48.5%) have a high level of health literacy. In the study, XGBoost, random forest, logistic regression models from machine learning methods were used and indexes were calculated. ResultsWhen the results of XGBoost, random forest, logistic regression models were evaluated, it was found that the model with the best performance was XGBoost. Sensitivity, specificity, F1-score, AUROC and Brier score values for the XGBoost models were obtained as 0.979, 0.965, 0.973, 0.983, 0.054 respectively. ConclusionIt was found that HL levels differed significantly in the variables of gender, age, class, family education, place of residence, economic situation, and covering health expenses (p < 0.05). According to the XGBoost model, it was found that the variable with the highest level of importance was reading the newspaper, while the variable with the lowest level of importance was the educational status of the mother. With the help of the established model, the basic variables that will affect the HL level were determined. The designed model will constitute the basic step of an supporting design system to improve physician-patient communication.

Full Text