Abstract South Africa was the most affected country in Africa by the coronavirus disease 2019 (COVID-19) pandemic, where over 4 million confirmed cases of COVID-19 and over 102,000 deaths have been recorded since 2019. Aside from clinical methods, artificial intelligence (AI)-based solutions such as machine learning (ML) models have been employed in treating COVID-19 cases. However, limited application of AI for COVID-19 in Africa has been reported in the literature. This study aimed to investigate the performance and interpretability of several ML algorithms, including deep multilayer perceptron (Deep MLP), support vector machine (SVM) and Extreme gradient boosting trees (XGBoost) for predicting COVID-19 mortality risk with an emphasis on the effect of cross-validation (CV) and principal component analysis (PCA) on the results. For this purpose, a dataset with 154 features from 490 COVID-19 patients admitted into the intensive care unit (ICU) of Tygerberg Hospital in Cape Town, South Africa, during the first wave of COVID-19 in 2020 was retrospectively analysed. Our results show that Deep MLP had the best overall performance (F1 = 0.92; area under the curve (AUC) = 0.94) when CV and the synthetic minority oversampling technique (SMOTE) were applied without PCA. By using the Shapley Additive exPlanations (SHAP) model to interpret the mortality risk predictions, we identified the Length of stay (LOS) in the hospital, LOS in the ICU, Time to ICU from admission, days discharged alive or death, D-dimer (blood clotting factor), and blood pH as the six most critical variables for mortality risk prediction. Also, Age at admission, Pf ratio (PaO2/FiO2 ratio), troponin T (TropT), ferritin, ventilation, C-reactive protein (CRP), and symptoms of acute respiratory distress syndrome (ARDS) were associated with the severity and fatality of COVID-19 cases. The study reveals how ML could assist medical practitioners in making informed decisions on handling critically ill COVID-19 patients with comorbidities. It also offers insight into the combined effect of CV, PCA, and SMOTE on the performance of ML models for COVID-19 mortality risk prediction, which has been little explored.
Read full abstract