Development of climate data storage and processing model

I G Okladnikov,A G Titov,E P Gordov

doi:10.1088/1755-1315/48/1/012030

Abstract

We present a storage and processing model for climate datasets elaborated in the framework of a virtual research environment (VRE) for climate and environmental monitoring and analysis of the impact of climate change on the socio-economic processes on local and regional scales. The model is based on a «shared nothings» distributed computing architecture and assumes using a computing network where each computing node is independent and selfsufficient. Each node holds a dedicated software for the processing and visualization of geospatial data providing programming interfaces to communicate with the other nodes. The nodes are interconnected by a local network or the Internet and exchange data and control instructions via SSH connections and web services. Geospatial data is represented by collections of netCDF files stored in a hierarchy of directories in the framework of a file system. To speed up data reading and processing, three approaches are proposed: a precalculation of intermediate products, a distribution of data across multiple storage systems (with or without redundancy), and caching and reuse of the previously obtained products. For a fast search and retrieval of the required data, according to the data storage and processing model, a metadata database is developed. It contains descriptions of the space-time features of the datasets available for processing, their locations, as well as descriptions and run options of the software components for data analysis and visualization. The model and the metadata database together will provide a reliable technological basis for development of a high- performance virtual research environment for climatic and environmental monitoring.

Full Text