Abstract

For multidimensional data, Space-Filling Curves (SFCs) have been used to improve the execution time of spatial data queries. However, their effect on compression, when used to reorder the uncompressed values, is known to a lesser extent. We investigate the impact of three SFCs on Shuttle Radar Topographic Mission (SRTM) elevation data and Square-Kilometre Array telescope (SKA) radio-astronomy data: two types of datasets to which SFCs have not been extensively applied, within a compression context. This work contributes to the understanding of how such reorderings impact compression performance and affect different compression schemes and preprocessing techniques through their use. We show empirical results from combining eight common compression schemes, the Z-Order, Gray-Code, and Hilbert space-filling curves, and the bitwise preprocessing technique BitShuffle. The Hilbert Curve consistently outperforms the other orderings for the SRTM dataset though the mapping implementation incurs a significant speed penalty. However, the Z-Order and Gray-Code Curves are best for the SKA dataset. Through an analysis of the dataset autocorrelations, file-entropies, and block-entropies; we show that the SKA dataset's dimensional bias is not exploited as much by the Hilbert Curve compared to the Z-Order and Gray-Code Curves. However, the Hilbert Curve is the most appropriate for the SRTM dataset as it can be modelled as isotropic and has a significantly higher level of local autocorrelation. BitShuffle is necessary to practically compress the SKA data, but does contribute to the compression performance of the SRTM dataset. These curves and BitShuffle are advantageous in reducing block-entropy values for such datasets.

Highlights

  • Space-Filling Curves are mappings between the onedimensional space and the d-dimensional space, and are used to improve query times of spatial data-structures by reordering and indexing the underlying values, preserving some spatial locality. Their d-dimensional orderings create curves which wrap around themselves and traverse local subregions, clustering nearby points together. This property results in some neighbouring d-dimensional values being closer in the one-dimensional space than if a standard row-major or raster scan was used; the extent to which is dependent on the type

  • As Space-Filling Curve (SFC) map between d and one dimensions, they are applicable for Machine Learning (ML) scenarios where data must be mapped into an alternative form appropriate for a given ML model, in some

  • In this paper we focus on three SFCs and Row-Major Order, which we treat as a reference SFC called the Raster Curve or Raster Scan

Read more

Summary

Introduction

Space-Filling Curves are mappings between the onedimensional space and the d-dimensional space, and are used to improve query times of spatial data-structures by reordering and indexing the underlying values, preserving some spatial locality. Their d-dimensional orderings create curves which wrap around themselves and traverse local subregions, clustering nearby points together. Lebesgue discovered the Z-Order curve by interleaving the bits of d integer coordinate values resulting an observable zigzag pattern [20] It was popularized by Morton in his work applying it to geodetic databases [21] and has been referred to as the Lebesgue Curve and the Morton Curve. Though different curves achieve the best metric in each paper, only the Z-Order, Gray-Code, and Hilbert Curves are covered in this paper to limit the scope

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.