Abstract

Numerical modelling increasingly generates massive, high-dimensional spatio-temporal datasets. Exploring such datasets relies on effective visualization. This study presents a generic workflow to (i) project high-dimensional spatio-temporal data on a two-dimensional (2D) plane accurately (ii) compare dimensionality reduction techniques (DRTs) in terms of resolution and computational efficiency (iii) represent 2D projection spatially using a 2D perceptually uniform background color map. Machine learning (ML) based DRTs for data visualization i.e., principal component analysis (PCA), generative topographic mapping (GTM), t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are compared in terms of accuracy, resolution and computational efficiency to handle massive datasets. The accuracy of visualization is evaluated using a quality metric based on a co-ranking framework. The workflow is applied to an output of an Australian Water Resource Assessment (AWRA) model for Tasmania, Australia. The dataset consists of daily time series of nine components of the water balance at a 5 km grid cell resolution for the year 2017. The case study shows that PCA allows rapid visualization of global data structures, while t-SNE and UMAP allows more accurate representation of local trends. Furthermore, UMAP is computationally more efficient than t-SNE and least affected by the outliers compared to GTM.

Highlights

  • One of the biggest challenges of the big data era is to make sense out of all the information available

  • This study focuses on one linear and three non-linear Machine learning (ML) based unsupervised Dimensionality Reduction Techniques (DRTs) for visualization i.e., principal component analysis (PCA) [16], generative topographic mapping (GTM) [19], t-distributed stochastic neighbor embedding (t-SNE) [12] and uniform manifold approximation and projection (UMAP) [43] summarized in Table 1 along with their respective computational efficiencies in terms of data points represented by P

  • The suggested workflow applied DRTs to visualize the multivariate Australian Water Resource Assessment (AWRA) model output hydrological components in order to determine the prominent spatio-temporal features followed by its quantification to access the visualization accuracy

Read more

Summary

Introduction

One of the biggest challenges of the big data era is to make sense out of all the information available. Not all that huge volume of data is informative. Such datasets may contain spatial or temporal information or both spatial and temporal information. The information is available either in the form of grid or point data. Gridded data is difficult to capture in low-dimensional space especially in Earth sciences, due to their dynamic and non-linear behavior. Effective data visualization plays a key role in exploring such big datasets, finding patterns/features and outliers. Such insights are essential to develop hypotheses on the data-generating processes [1]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call