Regional Clustering Based on Types of Non-Communicable Diseases Using k-Means Algorithm

Tb Ai Munandar,Ajif Yunizar Yusuf Pratama

doi:10.30812/matrik.v23i2.3352

Abstract

Noncommunicable diseases (NCDs) have become a global threat to public health, necessitating a comprehensive understanding of their geographic and epidemiological distribution in order to devise appropriate interventions. The objective of this study is to clustering areas of Banten Province based on NCDS profiles using the unsupervised learning technique. The method used in this study is the k-means algorithm for grouping types of non-communicable diseases based on region. The processing and normalisation of NCDS prevalence data from various health sources preceded cluster analysis using the k-means clustering algorithm. This research is categorised into two scenarios: the first involves the clustering of data obtained from outlier analysis, while the second scenario excludes any outliers. The objective is to observe disparities in regional clustering outcomes by categorising non-communicable diseases according to these two scenarios. The silhouette index is used to determine the validity of cluster results. These findings are analysed in depth to determine the geographic and socioeconomic patterns associated with each cluster's NCDS profile. Based on the mean silhouette index value of 0.812, the results indicate that the sum of k = 2 in the k-means algorithm is the optimal cluster result in this case. Five non-communicable diseases, namely diabetes, hypertension, obesity, stroke, and cataracts, necessitate significant focus in the first cluster (C1), where 202 regions were grouped. Six regions belong to the second cluster (C2), which includes areas that are not only susceptible to the five non-communicable diseases in cluster C1 but also to breast cancer, cervical cancer, heart disease, chronic obstructive pulmonary disease (COPD), and congenital deafness.

Full Text