Background and aimsHigh non-suicidal self-injury (NSSI) prevalence among adolescents is a global health issue. However, current prediction models for adolescent NSSI rely on a limited set of algorithms, resulting in biased predictions. Therefore, the aim of this study is to develop multiple machine learning models to enhance prediction accuracy and mitigate biases among Chinese adolescents. MethodsA total of 4487 junior and senior high school students in China were recruited. Multiple algorithms were included, such as logistic regression, decision tree, support vector machine, Naive Bayes, multi-layer perceptron, K-nearest neighbors, and ensemble learning algorithm like random forest, bagging, AdaBoost, and stacking to build predictive models. Data processing techniques, including standardization and the synthetic minority oversampling technique, were employed to optimize the predictive model. The model was trained on 70 % of the data, reserving 30 % for testing. ResultsThe ten prediction models achieved a good performance, with area under the receiver operating characteristic curve (AUC) scores above 0.700 in the test set. The stacking and random forest models achieved AUC scores of 0.904 and 0.898, respectively. The prediction performance of the Naive Bayes model was relatively poor. The top five important variables were resilience, bully, suicidal ideation, internet addiction, and depression. ConclusionsThe ensemble machine learning algorithm showed promising results predicting NSSI among adolescents. Such algorithms should be recommended for future NSSI research to enhance predictive accuracy. Identification of important features in NSSI prediction can help develop screening protocols and lay a foundation for clinical diagnosis and intervention in adolescent populations.
Read full abstract