Abstract

The traditional flow of coastal ocean model data is from High-Performance Computing (HPC) centers to the local desktop, or to a file server where just the needed data can be extracted via services such as OPeNDAP. Analysis and visualization are then conducted using local hardware and software. This requires moving large amounts of data across the internet as well as acquiring and maintaining local hardware, software, and support personnel. Further, as data sets increase in size, the traditional workflow may not be scalable. Alternatively, recent advances make it possible to move data from HPC to the Cloud and perform interactive, scalable, data-proximate analysis and visualization, with simply a web browser user interface. We use the framework advanced by the NSF-funded Pangeo project, a free, open-source Python system which provides multi-user login via JupyterHub and parallel analysis via Dask, both running in Docker containers orchestrated by Kubernetes. Data are stored in the Zarr format, a Cloud-friendly n-dimensional array format that allows performant extraction of data by anyone without relying on data services like OPeNDAP. Interactive visual exploration of data on complex, large model grids is made possible by new tools in the Python PyViz ecosystem, which can render maps at screen resolution, dynamically updating on pan and zoom operations. Two examples are given: (1) Calculating the maximum water level at each grid cell from a 53-GB, 720-time-step, 9-million-node triangular mesh ADCIRC simulation of Hurricane Ike; (2) Creating a dashboard for visualizing data from a curvilinear orthogonal COAWST/ROMS forecast model.

Highlights

  • Analysis, visualization, and distribution of coastal ocean model data is challenging due to the sheer size of the data involved, with regional simulations commonly in the 10GB–1TB range

  • The traditional workflow is to download data to local workstations or file servers from which the data needed can be extracted via services such as OPeNDAP [1]

  • Dask workers perform operations in parallel, and dask worker clusters can be created on local machines with multiple CPUs, on High-Performance Computing (HPC) with job submission, and on the Cloud via Kubernetes [17] orchestration of Docker [18] containers

Read more

Summary

Introduction

Visualization, and distribution of coastal ocean model data is challenging due to the sheer size of the data involved, with regional simulations commonly in the 10GB–1TB range. The Cloud and recent advances new opportunities for analysis, visualization, and use distribution of model data, overcoming of data access and is becoming time and cost inefficient.these problems [4]. They have theallowing barrier of entry and poised to and visualization takehave place in the Cloud, close lowered to the data, efficient and are cost-effective transform theonly ability of that regular scientists and collaborate difficult research access, as the data needs to leave theengineers. We converted the model output from NetCDF format to Zarr format which was developed to allow Cloud-friendly access to n-dimensional array data. The major features of the HDF5 and NetCDF4 data models are supported: Self-describing datasets with variables, dimensions and attribute, supporting groups, chunking, and compression It is being developed in an open community fashion on GitHub, with contributions from multiple research organizations. Dask workers perform operations in parallel, and dask worker clusters can be created on local machines with multiple CPUs, on HPC with job submission, and on the Cloud via Kubernetes [17] orchestration of Docker [18] containers

Xarray
EarthSim
PyViz: HoloViews
PyViz: GeoViews
PyViz HvPlot
2.10. PyViz: Panel
2.11. JupyterHub
2.12. Kubernetes
2.13. Conda: Reproducible Software Environment
2.14. Community
Deployment oncollaborator
Example
60 Dask workers utilizing
Simple
Findings
Discussion
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.