Abstract

Global climate modeling not only requires computation capabilities, but also brings tough challenges for data storage systems. The input and output data sets generally require hundreds or even thousands of terabytes storage. Therefore, storage reduction methods, such as content deduplication and various data compression methods, are extremely important for reducing the storage size requirement in climate modeling. However, little work has been done on investigating the effectiveness of these data reduction methods for climate data sets. In this paper, the potential benefit of data reduction for climate data is studied by investigating a total of 46.5 TB climate data sets, including 3 observation data sets (14.1 TB) and 3 climate model output data sets (32.4 TB). Five different data compression algorithms and two types of content deduplication mechanisms are applied to these data sets to study the possible data reduction effectiveness. Further more, the compressibility of different climate component data is also examined. Our work demonstrates the potential of applying data reduction methods in climate modeling platforms, and provides guidance for selecting the suitable methods for different kinds of climate data sets. We find that the compression method $${LCFP}$$ L C F P can provide the best compression ratio; however, its throughputs, especially the inflate throughputs are much lower than all the others. To strike a better balance between compression ratio and throughputs, we propose a new compression method for the model output data. The new compression method can achieve comparable compression ratio, while attain about 20 times higher inflate throughput than that of $${LCFP}$$ L C F P .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.