Discovering the Optimal Number of Crime Cluster Using Elbow, Silhouette, Gap Statistics, and NbClust Methods

Noviyanti T M Sagala,Alexander Agung Santoso Gunawan

doi:10.21512/comtech.v13i1.7270

Noviyanti T M Sagala, Alexander Agung Santoso Gunawan

Open Access

https://doi.org/10.21512/comtech.v13i1.7270

Copy DOI

Abstract

In recent years, crime has been critical to be analyzed and tracked to identify the trends and associations with crime patterns and activities. Generally, the analysis is conducted to discover the area or location where the crime is high or low by using different clustering methods, including k-means clustering. Even though the k-means algorithm is commonly used in clustering techniques because of its simplicity, convergence speed, and high efficiency, finding the optimal number of clusters is difficult. Determining the correct clusters for crime analysis is critical to enhancing current crime resolution rates, avoiding future incidents, spending less time for new officers, and increasing activity quality. To address the problem of estimating the number of clusters in the crime domain without the interference of humans, the research carried out Elbow, Silhouette, Gap Statistics, and NbClust methods on datasets of Major Crime Indicators (MCI) in 2014−2019. Several stages were performed to process the crime datasets: data understanding, data preparation, cluster modelling, and cluster validation. The first two phases were performed in the R Studio environment and the last two stages in Azure Studio. From the experimental result, Elbow, Silhouette, and NbClust methods suggest a similar number of optimum clusters that is two. After validating the result using the average Silhouette method, the research considers two clusters as the best clusters for the dataset. The visualization result of Silhouette method displays the value of 0,73. Then, the observation of the data is well-grouped. It is placed in the correct group.

Full Text