Abstract

The novel coronavirus spreads from person to person through close contact and respiratory droplets such as coughing or sneezing. Various studies have been conducted globally to deal with COVID-19. However, no cure for the virus has been found , and efficient data processing methods for sudden outbreaks have not yet been identified. This study compares three algorithms for data sets to analyze clustering patterns to determine the best data processing method. The data of this study comes from the Chinese Center for Disease Control and Prevention, including two attributes of confirmed cases and death cases. We selected the data from the initial stage of the outbreak until October 31, 2021. We compared the data analysis and processing results of the clustering of the spread of the new coronavirus in China by the K-Means, K-Medoids and K-Means++ algorithms. By comparing the Calinski-Harabasz index values from K=2 to K=10, the results show that the K-Means, K-Medoids and K-Means++ algorithms have almost the same clustering effect when K does not exceed 6, but when the K value is greater than 6. When the K-Medoids clustering effect is significantly better, therefore, from the three clustering algorithms used, it can be concluded that the best method for clustering the spread of the novel coronavirus outbreak in China is the K-Medoids method. The results of this study provides ideas for future researchers to choose an appropriate cluster analysis method to effectively process the data in the early stages of the epidemic.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call