Abstract

There are several quickly spreading illnesses such as DHFs spread by mosquitoes, COVID-19 spreads through respiratory droplets and contact with contaminated surfaces, and Varicella spreads by direct touch. The transmission rate of these diseases can be reduced if medical services can identify them early. However, the performance of the prediction model based on the machine learning approach is limited by the availability of labeled patient datasets. This study showed some empirical evidence of the use of synthetic data generated using actual medical records as the basis to improve the performance of the prediction model. The empirical results showed that the Decision Tree algorithm which is trained using a mixed synthetic and actual dataset can achieve 91.98% average accuracy which is higher than model performance which is trained using real dataset only. The results of model interpretation using Shapley Additive Explanations have the advantage of measuring the overall dominant features and indicating that the top five most important features are vThrombocyte, vTemp, vCough, vSpot, and vNauseous.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call