The interpretable machine learning model associated with metal mixtures to identify hypertension via EMR mining method.

Site Xu,Mu Sun

doi:10.1111/jch.14768

Abstract

There are limited data available regarding the connection between hypertension and heavy metal exposure. The authors intend to establish an interpretable machine learning (ML) model with high efficiency and robustness that identifies hypertension based on heavy metal exposure. Our datasets were obtained from the US National Health and Nutrition Examination Survey (NHANES, 2013-2020.3). The authors developed 5 ML models for hypertension identification by heavy metal exposure, and tested them by 10 discrimination characteristics. Further, the authors chose the optimally performing model after parameter adjustment by Genetic Algorithm (GA) for identification. Finally, in order to visualize the model's ability to make decisions, the authors used SHapley Additive exPlanation (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) algorithm to illustrate the features. The study included 19368 participants in total. A best-performing eXtreme Gradient Boosting (XGB) with GA for hypertension identification by 16 heavy metals was selected (AUC: 0.774; 95% CI: 0.772-0.776; accuracy: 87.7%). According to SHAP values, Barium (0.02), Cadmium (0.017), Lead (0.017), Antimony (0.008), Tin (0.007), Manganese (0.006), Thallium (0.004), Tungsten (0.004) in urine, and Lead (0.048), Mercury (0.035), Selenium (0.05), Manganese (0.007) in blood positively influenced the model, while Cadmium (-0.001) in urine negatively influenced the model. Study participants' hypertension associated with heavy metal exposure was identified by an efficient, robust, and interpretable GA-XGB model with SHAP and LIME. Barium, Cadmium, Lead, Antimony, Tin, Manganese, Thallium, Tungsten in urine, and Lead, Mercury, Selenium, Manganese in blood are positively correlated with hypertension, while Cadmium in blood is negatively correlated with hypertension.

Full Text