Abstract
Efficiently extracting information from high dimensional hydro-meteorological modelling datasets requires smart solutions. Traditional methods are mostly based on files, which can be edited and accessed handily. But they have problems of efficiency due to contiguous storage structure. Others propose databases as an alternative for advantages such as native functionalities for manipulating multidimensional (MD) arrays, smart caching strategy and scalability. In this research, NetCDF file based solutions and the multidimensional array database management system (DBMS) SciDB applying chunked storage structure are benchmarked to determine the best solution for storing and querying 5D large hydrologic modelling dataset. The effect of data storage configurations including chunk size, dimension order and compression on query performance is explored. Results indicate that dimension order to organize storage of 5D data has significant influence on query performance if chunk size is very large. But the effect becomes insignificant when chunk size is properly set. Compression of SciDB mostly has negative influence on query performance. Caching is an advantage but may be influenced by execution of different query processes. On the whole, NetCDF solution without compression is in general more efficient than the SciDB DBMS.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IOP Conference Series: Earth and Environmental Science
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.