Malaysia reported its first imported COVID-19 case on 23 January 2020, which marked the country’s first confirmed positive case. The first case in Malaysia was from eight close contacts in Johor. The global health landscape has been significantly impacted by the COVID-19 pandemic, with mortality or survival being critical outcomes of interest. This study aims to predict COVID-19 survival occurrences in Malaysia by utilizing machine learning approaches based on demographic factors. The dataset used in this study comprises demographic information of 2,151,315 COVID-19 patients, including nationality, regions, age groups, gender, medical history, vaccine brands, and the number of vaccine doses received between 2020 and 2022. Four machine learning algorithms, namely Logistic Regression, Naïve Bayes, Support Vector Machine, and Artificial Neural Network were employed to assess the relationship between demographic factors and COVID-19 survival. To evaluate the model performance, the datasets are categorized into imbalanced and balanced (down-sampling). The results indicate that the balanced dataset (down-sampling) outperforms the imbalanced dataset in terms of overall accuracy, sensitivity, specificity, precision, and Area Under the Curve (AUC). Based on the analysis, the Artificial Neural Network (ANN) classifier exhibited the highest performance with a specificity 95.2% on a balanced dataset. The model excels in accurately identifying survivors, thereby minimizing false mortality predictions and is selected as the best model for predicting COVID-19 survival. Its capacity to process larger sample sizes, combined with numerous interconnected nodes, enables it to identify complex patterns and extract meaningful insights from diverse datasets, such as demographic factors. Additionally, the optimization of parameters, including the number of layers, learning rate, and activation functions, significantly contributed to its superior accuracy. The study identifies that those of chronic diseases, male, and aged 45 and above as the notable factors associated with lower survival rates among COVID-19 patients. The findings underscore the importance of completing the vaccination series by obtaining at least the second dose, as the first dose alone may not offer sufficient protection. In conclusion, this study successfully achieves its objectives by identifying the optimal dataset configuration and predictive model for forecasting COVID-19 survival based on demographic factors. This network could serve as a benchmark model classifier, offering a valuable tool to predict and promote vaccinations, as well as optimize the general healthcare system during the pandemic outbreak. The study not only contributes to the theoretical understanding of effective COVID-19 prediction but also emphasizes the practical implications of integrating advanced machine learning techniques into pandemic management strategies. Future research can build upon these findings by exploring additional machine learning techniques and considering geographical and environmental factors to further enhance the accuracy of long-term predictions.
Read full abstract