UMAP for Geospatial Data Visualization

I De Zarzà,J De Curtò,Carlos T Calafate

doi:10.1016/j.procs.2023.10.155

Abstract

In this paper, we examine the efficacy of unsupervised learning approaches, particularly clustering and dimensionality reduction techniques, in practical applications such as image compression and geospatial data visualization. Initially, we scrutinize the applicability of the elbow rule in determining the optimal number of clusters within synthetic datasets of varying structures using the k-means algorithm. Subsequently, we evaluate the potency of density-based (DBSCAN) and hierarchical clustering algorithms in unearthing intrinsic patterns within these datasets. Shifting our focus towards practical applications, we leverage the k-means clustering for image compression, demonstrating a notable reduction in storage requirements without a significant compromise in visual quality. Our findings articulate that a strategic selection of cluster numbers can yield substantial compression rates. In the final segment of our investigation, we delve into the utility of dimensionality reduction techniques, specifically t-SNE and UMAP, in the realm of geospatial data visualization. Utilizing a dataset comprising distances between Spanish provinces, we gauge the proficiency of these techniques in preserving relative distances upon projection onto a two-dimensional plane. Our observations denote that both t-SNE and UMAP can generate precise visual representations of the original geographic layouţ with UMAP exhibiting exceptional performance in comparison with alternative methodologies. On a broader scale, our study underscores the versatility and practical utility of unsupervised learning techniques across an array of applications, ranging from image compression to geospatial data visualization. It emphasizes the significance of comprehending their foundational mechanisms and fine-tuning their hyperparameters to achieve optimal performance.

Full Text