Abstract
Understanding patterns among foreign tourists is an urgent matter. These patterns can become knowledge that helps in making better decisions because they are data-driven. The pattern to be elaborated on is regarding the clustering of visits by foreign tourists to tourist destinations in Jakarta. Data mining is an approach that extracts knowledge patterns from a dataset. K-Means is one of the data mining algorithms used for clustering data, where data is grouped based on similarity in features and attributes. This study compares the Euclidean Distance, Manhattan Distance, and Haversine Distance methods to obtain more representative data clusters for the datasets. The datasets in this study are not normally distributed due to outlier data; hence, the DBSCAN algorithm is used for improvement without removing or cutting the data, as it can result in a significant amount of missing values that could affect information that does not align with empirical facts. In this study, 5 clusters were created based on elbow calculation results. The K-Means cluster testing in Euclidean distance yielded a Silhouette Score of 0.36, Inertia of 0.86, and Davies-Bouldin Index of 2.39. The Manhattan method resulted in a Silhouette Score of 0.65, Inertia of 1.46, and Davies-Bouldin Index of 0.47. Meanwhile, applying the Haversine method resulted in a Silhouette Score of 0.36, Inertia of 0.03, and a value of 2.39 for the Davies-Bouldin Index.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.