Abstract

With the increase in computational power, ocean models with kilometer-scale resolution have emerged over the last decade. These models have been used for quantifying the energetic exchanges between spatial scales, informing the design of eddy parametrizations and preparing observing networks. The increase in resolution, however, has drastically increased the size of model outputs, making it difficult to transfer and analyze the data. Nonetheless, it is of primary importance to assess more systematically the realism of these models. Here, we showcase a cloud-based analysis framework proposed by the Pangeo Project that aims to tackle such distribution and analysis challenges. We analyze the output of eight submesoscale-permitting simulations, all on the cloud, for a crossover region of the upcoming Surface Water and Ocean Topography (SWOT) altimeter mission near the Gulf Stream separation. The models used in this study are run with the NEMO, CROCO, MITgcm, HYCOM, FESOM and FIO-COM code bases. The cloud-based analysis framework: i) minimizes the cost of duplicating and storing ghost copies of data, and ii) allows for seamless sharing of analysis results amongst collaborators. We describe the framework and provide example analyses (e.g., sea-surface height variability, submesoscale vertical buoyancy fluxes, and comparison to predictions from the mixed-layer instability parametrization). Basin-to-global scale, submesoscale-permitting models are still at their early stage of development; their cost and carbon footprints are also rather large. It would, therefore, benefit the community to document the different model configurations for future best practices. We also argue that an emphasis on data analysis strategies would be crucial for improving the models themselves.

Highlights

  • Collaboration amongst multiple ocean modelling institutions and/or the reproducing of scientific results from numerical simulations required the duplication, individual sharing and downloading of data, upon which each party of interest would analyze the data on their local workstation or cluster

  • As realistic ocean simulations with kilometric horizontal resolution have emerged (e.g., Rocha et al, 2016; Brodeau et al, 2020; Gula et al, 2021; Ajayi et al, 2021), such a framework has become cumbersome with tera- and peta-bytes of data needed to be transferred and stored as ghost copies

  • We would like to achieve the same goal as Ocean Model Intercomparison Project (OMIP) but by inter-comparing submesoscale-permitting ocean models, which have been argued to be sensitive to grid-scale processes and numerical schemes as we increasingly push the model resolution 30 closer to the scales of non-hydrostatic dynamics and isotropic three-dimensional (3D) turbulence (Hamlington et al, 2014; Soufflet et al, 2016; Ducousso et al, 2017; Barham et al, 2018; Bodner and Fox-Kemper, 2020)

Read more

Summary

Introduction

Collaboration amongst multiple ocean modelling institutions and/or the reproducing of scientific results from numerical simulations required the duplication, individual sharing and downloading of data, upon which each party of interest would analyze the data on their local workstation or cluster. In collaboration with Pangeo Forge (Stern et al, 2022, https://pangeo-forge.readthedocs.io/en/latest/), we have attempted to fill this niche by streamlining the process of data preparation and submission To transform their data into analysis-ready cloud optimized (ARCO) formats, data providers (ocean modeling institutions in our case) need only specify the source file location (e.g., as paths on an Ftp, Http or OPeNDAP server) along with output dataset parameters (e.g., particular ARCO format, chunking) 70 in a Python module known as a recipe. The crowdsourcing approach of Pangeo Forge, to which any data provider can contribute, benefits the immediate scientific needs of a single research project, and the entire scientific community in the form of shared, publicly accessible 80 ARCO datasets which remain available for all to access This saves each scientist the cost of duplicating and storing ghost copies of the data and allows for reproducible science.

Cloud-based JupyterHub
Example analyses
Surface diagnostics of the temporal mean and variability
Three-dimensional diagnostics on physical processes
Conditions for sustainability
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call