Abstract
This empirical study assessed the potential of developing a machine-learning model to identify children and adolescents with poor oral health using only self-reported survey data. Such a model could enable scalable and cost-effective screening and targeted interventions, optimizing limited resources to improve oral health outcomes. To train and test the model, we used data from 2,133 students attending schools in a Portuguese municipality. Poor oral health (the dependent variable) was defined as having a Decayed, Missing, and Filled Teeth index for deciduous teeth (dmft) or permanent teeth (DMFT) above expert-defined thresholds (dmft/DMFT ≥ 3 or 4). The survey provided information about the students' oral health habits, knowledge, beliefs, and food and physical activity habits, which served as independent variables. Logistic regression models with variables selected through low-variance filtering and recursive feature elimination outperformed various others trained with complex machine learning algorithms based on precision@k metric, outperforming also random selection and expert rule-based models in identifying students with poor oral health. The proposed models are inherently explainable, broadly applicable, which given the context, could compensate their lower performance (Area Under the Curve = 0.64-0.70) compared to similar approaches and models. This study is one of the few in oral health care that includes bias auditing of classification models. The audit surfaced potential biases related to demographic factors such as age and social assistance status. Addressing these biases without significantly compromising model performance remains a challenge. The results confirm the feasibility of survey-based machine learning models for identifying individuals with poor oral health, but further validation of this approach and pilot testing in field trials are necessary.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have