Does multidimensional daily information predict the onset of myopia? A 1-year prospective cohort study

Wei Peng,Jingcheng Chen,Shaoming Sun,Fei Wang,Yining Sun,Mu Wang

doi:10.1186/s12938-023-01109-8

Abstract

PurposeThis study aimed to develop an interpretable machine learning model to predict the onset of myopia based on individual daily information.MethodThis study was a prospective cohort study. At baseline, non-myopia children aged 6–13 years old were recruited, and individual data were collected through interviewing students and parents. One year after baseline, the incidence of myopia was evaluated based on visual acuity test and cycloplegic refraction measurement. Five algorithms, Random Forest, Support Vector Machines, Gradient Boosting Decision Tree, CatBoost and Logistic Regression were utilized to develop different models and their performance was validated by area under curve (AUC). Shapley Additive exPlanations was applied to interpret the model output on the individual and global level.ResultOf 2221 children, 260 (11.7%) developed myopia in 1 year. In univariable analysis, 26 features were associated with the myopia incidence. Catboost algorithm had the highest AUC of 0.951 in the model validation. The top 3 features for predicting myopia were parental myopia, grade and frequency of eye fatigue. A compact model using only 10 features was validated with an AUC of 0.891.ConclusionThe daily information contributed reliable predictors for childhood’s myopia onset. The interpretable Catboost model presented the best prediction performance. Oversampling technology greatly improved model performance. This model could be a tool in myopia preventing and intervention that can help identify children who are at risk of myopia, and provide personalized prevention strategies based on contributions of risk factors to the individual prediction result.

Full Text