Machine learning algorithms identify hypokalaemia risk in people with hypertension in the United States National Health and Nutrition Examination Survey 1999–2018

Bernard Man Yung Cheung,Yuen Ting Cheng,Ziying Lin

doi:10.1080/07853890.2023.2209336

Bernard Man Yung Cheung, Yuen Ting Cheng + Show 1 more

Open Access

https://doi.org/10.1080/07853890.2023.2209336

Copy DOI

Journal: Annals of Medicine	Publication Date: May 10, 2023
Citations: 4	License type: open-access

Affiliation: University of Hong Kong

Abstract

Background Hypokalaemia is a side-effect of diuretics. We aimed to use machine learning to identify features predicting hypokalaemia risk in hypertensive patients. Methods Participants with hypertension in the United States National Health and Nutrition Examination Survey 1999–2018 were included for analysis. To select the most suitable algorithm, we tested and evaluated five machine learning algorithms commonly employed in epidemiological studies: Logistic Regression, k-Nearest Neighbor, Random Forest, Recursive Partitioning and Regression Trees, and eXtreme Gradient Boosting. These algorithms were accessed using a set of 38 screened features. We then selected the key hypokalaemia-associated features in the hypertension group and their cardiovascular diseases (CVD) subgroup using the SHapley Additive exPlanations (SHAP) values. Using SHAP values, the key features and their impact pattern on hypokalaemia risk were determined. Results A total of 25,326 hypertensive participants were included for analysis, of whom 4,511 had known CVD. The Random Forest algorithm had the highest AUROC (hypertension dataset: 0.73 [95%CI, 0.71–0.76]; CVD subgroup: 0.72 [95%CI, 0.66–0.78]). Moreover, the nomogram based on the top twelve key features screened by random forest retained good performance: age, sex, race, poverty income ratio, body mass index, systolic and diastolic blood pressure, non-potassium-sparing diuretics use and duration, renin-angiotensin blockers use and duration, and CVD history in hypertension dataset; while in CVD subgroup, the additional key features were comorbid diabetes, education level, smoking status, and use of bronchodilators. Conclusion Our predictive model based on the random forest algorithm performed best among the tested and evaluated five algorithms. Hypokalaemia-associated key features have been identified in hypertensive patients and the subgroup with CVD. These findings from machine learning facilitate the development of artificial intelligence to highlight hypokalaemia risk in hypertension patients. Key messages: Our predictive model based on the random forest algorithm performed best among the tested and evaluated five algorithms, and hypokalemia-associated key features have been identified in hypertensive patients and the subgroup with cardiovascular disease. The nomogram we developed including twelve key features might be useful and applied in primary clinical consultations to identify the hypertensive patients at risk of hypokalaemia. These findings from machine learning facilitate the development of artificial intelligence to highlight hypokalaemia risk in hypertension patients

Full Text