Abstract
Scientific datasets from global-scale earth science models and remote sensing instruments are becoming available at greater spatial and temporal resolutions with shorter lag times. Water data are frequently stored as multidimensional arrays, also called gridded or raster data, and span two or three spatial dimensions, the time dimension, and other dimensions which vary by the specific dataset. Water engineers and scientists need these data as inputs for models and generate data in these formats as results. A myriad of file formats and organizational conventions exist for storing these array datasets. The variety does not make the data unusable but does add considerable difficulty in using them because the structure can vary. These storage formats are largely incompatible with common geographic information system (GIS) software. This introduces additional complexity in extracting values, analyzing results, and otherwise working with multidimensional data since they are often spatial data. We present a Python package which provides a central interface for efficient access to multidimensional water data regardless of the file format. This research builds on and unifies existing file formats and software rather than suggesting entirely new alternatives. We present a summary of the code design and validate the results using common water-related datasets and software.
Highlights
IntroductionFor many numerical models in the earth sciences, an important part of the input data is a time series of gridded spatial data representing a phenomenon at sequential time steps
This paper presents the design and development of a new method and its implementation in a Python package that addresses the practical difficulties in accessing, acquiring, and using gridded data in the water domain that stem from the lack of standards and the many competing formats
We used a subset of the variables in the Global Forecast System (GFS) dataset to keep the file sizes comparable to the National Water Model (NWM) and GLDAS data, between
Summary
For many numerical models in the earth sciences, an important part of the input data is a time series of gridded spatial data representing a phenomenon at sequential time steps. Water models typically need a time series of values for input variables such as soil moisture, precipitation, surface runoff, or evapotranspiration; each of which are generated, archived, and distributed as raster datasets on large spatial or temporal domains [1,2]. The results of these models often produce additional multidimensional datasets such as are produced by MODFLOW or SRH-2D models and the United States National Water. Model [3,4,5] These raster data, called gridded data or multidimensional array data, are data stored in an array structure and represent the variation of a variable with respect to each of its dimensions. In addition to the spatial and temporal dimensions, other data dimensions, such as model realizations or ensemble numbers in stochastic models, may be used
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have