Abstract
The traditional flow of coastal ocean model data is from High-Performance Computing (HPC) centers to the local desktop, or to a file server where just the needed data can be extracted via services such as OPeNDAP. Analysis and visualization are then conducted using local hardware and software. This requires moving large amounts of data across the internet as well as acquiring and maintaining local hardware, software, and support personnel. Further, as data sets increase in size, the traditional workflow may not be scalable. Alternatively, recent advances make it possible to move data from HPC to the Cloud and perform interactive, scalable, data-proximate analysis and visualization, with simply a web browser user interface. We use the framework advanced by the NSF-funded Pangeo project, a free, open-source Python system which provides multi-user login via JupyterHub and parallel analysis via Dask, both running in Docker containers orchestrated by Kubernetes. Data are stored in the Zarr format, a Cloud-friendly n-dimensional array format that allows performant extraction of data by anyone without relying on data services like OPeNDAP. Interactive visual exploration of data on complex, large model grids is made possible by new tools in the Python PyViz ecosystem, which can render maps at screen resolution, dynamically updating on pan and zoom operations. Two examples are given: (1) Calculating the maximum water level at each grid cell from a 53-GB, 720-time-step, 9-million-node triangular mesh ADCIRC simulation of Hurricane Ike; (2) Creating a dashboard for visualizing data from a curvilinear orthogonal COAWST/ROMS forecast model.
Highlights
Analysis, visualization, and distribution of coastal ocean model data is challenging due to the sheer size of the data involved, with regional simulations commonly in the 10GB–1TB range
The traditional workflow is to download data to local workstations or file servers from which the data needed can be extracted via services such as OPeNDAP [1]
Dask workers perform operations in parallel, and dask worker clusters can be created on local machines with multiple CPUs, on High-Performance Computing (HPC) with job submission, and on the Cloud via Kubernetes [17] orchestration of Docker [18] containers
Summary
Visualization, and distribution of coastal ocean model data is challenging due to the sheer size of the data involved, with regional simulations commonly in the 10GB–1TB range. The Cloud and recent advances new opportunities for analysis, visualization, and use distribution of model data, overcoming of data access and is becoming time and cost inefficient.these problems [4]. They have theallowing barrier of entry and poised to and visualization takehave place in the Cloud, close lowered to the data, efficient and are cost-effective transform theonly ability of that regular scientists and collaborate difficult research access, as the data needs to leave theengineers. We converted the model output from NetCDF format to Zarr format which was developed to allow Cloud-friendly access to n-dimensional array data. The major features of the HDF5 and NetCDF4 data models are supported: Self-describing datasets with variables, dimensions and attribute, supporting groups, chunking, and compression It is being developed in an open community fashion on GitHub, with contributions from multiple research organizations. Dask workers perform operations in parallel, and dask worker clusters can be created on local machines with multiple CPUs, on HPC with job submission, and on the Cloud via Kubernetes [17] orchestration of Docker [18] containers
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.