Abstract

ObjectivesSince the launch of a nationwide general health check-up and instruction program in Japan in 2008, interest in strategies to improve implementation of the program based on predictive analytics has grown. We investigated the performance of prediction models developed to identify individuals classified as “requiring instruction” (high-risk) who were unlikely to participate in a health intervention program. MethodsData were obtained from one large health insurance union in Japan. The study population included individuals who underwent at least one general health check-up between 2008 and 2013 and were identified as “requiring instruction” in 2013. We developed three prediction models based on the gradient boosted trees (GBT), random forest (RF), and logistic regression (LR) algorithms using machine-learning techniques and compared the areas under the curve (AUC) of the developed models with those of two conventional methods The aim of the models was to identify at-risk individuals who were unlikely to participate in the instruction program in 2013 after being classified as requiring instruction at their general health check-up that year. ResultsAt first we performed the analysis using data without multiple imputation. The AUC values for the GBT, RF, and LR prediction models and conventional methods: 1, and 2 were 0.893 (95%CI: 0.882–0.905), 0.889 (95%CI: 0.877–0.901), 0.885 (95%CI: 0.872–0.897), 0.784 (95%CI: 0.767–0.800), and 0.757 (95%CI: 0.741–0.773), respectively. Subsequently, we performed the analysis using data after multiple imputation. The AUC values for the GBT, RF, and LR prediction models and conventional methods: 1, and 2 were 0.894 (95%CI: 0.882–0.906), 0.889 (95%CI: 0.887–0.901), 0.885 (95%CI: 0.872–0.898), 0.784 (95%CI: 0.767–0.800), and 0.757 (95%CI: 0.741–0.773), respectively. In both analyses, the GBT model showed the highest AUC among that of other models, and statistically significant difference were found in comparison with the LR model, conventional method 1, and conventional method 2. ConclusionThe prediction models using machine-learning techniques outperformed existing conventional methods: for predicting participation in the instruction program among participants identified as “requiring instruction” (high-risk).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call