This paper examines the application of the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) method to cluster Automated Fare Collection (AFC) transaction data from train travelers in Jakarta, Bogor, Depok, Tangerang, and Bekasi (Jabodetabek) in Indonesia. To enhance the clustering process, the Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction and the DenseClus library are employed. In this study, different combinations of hyperparameters are used to identify the optimal configuration for producing distinct clusters with a high concentration and noticeable distinction. The results demonstrate that the utilization of HDBSCAN on UMAP-reduced data effectively, discerning unique trip patterns and emphasizing notable disparities in travel distance, time, and length among various clusters. The UMAP intersection method showed notable efficacy in maintaining the local structure of the data, resulting in the development of distinct and meaningful clusters. In addition, categorical data were transformed into numerical formats using hashing techniques, efficiently tackling the difficulties posed by a high number of categories and assuring efficient data processing. The results reveal vital insights into the application of density-based clustering to intricate transportation data, with major implications for enhancing route planning and capacity management for Jabodetabek commuters.
Read full abstract