Transforming multi-dimensional data into a one-dimensional sequence using space-filling curves such as the Hilbert curve, the Gray curve, and the Z-curve has been studied extensively. These techniques are not sensitive to data or workload skewness, however, in practice, user-access patterns and data distributions are often very skewed in high dimensional space. It is desirable to produce a one-dimensional sequence which keeps the multi-dimensional grid cells that are queried together close to each other. This generates sequences with higher spatial locality. We propose a workload-based approach to produce one-dimensional ordering from multi-dimensional data in this paper. An extensive experimental evaluation suggests that our approach produces a high quality ordering sequence which outperforms the existing state-of-the-art Hilbert curve by a factor of 4.84, the Gray curve by a factor of 6.66, and the Z-curve by a factor of 7.26 for the number of subsequences used to answer a query; and for IO time, it outperforms the Hilbert curve by a factor of 2.20, the Gray curve by a factor of 2.25, and the Z-curve by 2.38.
Read full abstract