This paper proposes the use of machine learning models to predict one’s risk of having hypertension in the future using their routine health checkup data of their current and past visits to a health checkup center. The large-scale and high-dimensional dataset used in this study comes from MJ Health Research Foundation in Taiwan. The training data for models is separated into 5 folds and used to train 5 models in a 5-fold cross validation manner. While predicting the results for the test set, the voted result of 5 models is used as the final prediction. Experimental results show that our models achieve 69.59% of precision, 77.90% of recall, and 73.51% of F1-score, which outperforms a baseline using only the blood pressure of visitors’ last visits. Experiments also show that a visitor who performs a health checkup more often can be predicted better, and models trained with selected important factors achieve better results than those trained with Framingham risk score. We also demonstrate the possibility of using our models to suggest visitors for weight control by adding virtual visits that assume their body weight can be reduced in the near future to model input. Experimental results show that around 5.48% of the people who are with high Body Mass Index of the true positive cases are rejudged as negative, and a rising trend appears when adding more virtual visits, which may be used to suggest visitors that controlling their body weight for a longer time lead to lower probability of having hypertension in the future.
Read full abstract