Reduced Quotient Cube: Maximize Query Answering Capacity in OLAP

Quankun Wang,Benyuan Zou,Yu Chen,Lianyin Jia,Xingrui Huang,Jinguo You

doi:10.1109/access.2021.3120278

Quankun Wang, Benyuan Zou + Show 4 more

Open Access

https://doi.org/10.1109/access.2021.3120278

Copy DOI

Abstract

The data cube is a critical tool for accelerating online analysis in big data. Due to its exponential space overhead, the quotient cube, as the main data cube compression approach, was proposed to significantly reduce the number of data cells if they are aggregated over the same base tuple set, i.e. they are cover equivalent to form an equivalence class. Nevertheless, it still poses challenges to efficiently analyze massive data due to high storage space consumption. This paper proposes the reduced quotient cube (RQC) based on the following observation. (i) there are equivalence classes of various sizes in a quotient cube; (ii) the small equivalence classes usually dominate; (iii) the big equivalence classes are more capable of query answering since they can induce more data cells. Unlike the quotient cube, which preserves all the equivalence classes of equal priority, the reduced quotient cube preferentially does those with larger query answering capacity and smaller space occupied capacity. Further, we design its efficient constructing and querying algorithms. The extensive experimental results show that compared with the quotient cube, the reduced quotient cube space is only 11.3%, while the maximum query capacity is 95.9%. The query time of the reduced quotient cube is reduced by 51.24% on average compared to the quotient cube.

Highlights

I N recent years, with the continuous development of big data and data warehouses [1]–[3], it is still a big challenge to analyze and process massive data
We propose a reduced quotient cube model, in which only the upper bound of the equivalence class in the data cube is preserved, and each upper bound is covered by any other upper bound or covers other upper bound in the model
The cover capacity of the equivalence class in quotient cubes varies, and there will be a problem of less efficient storage in the limited storage space

Summary

INTRODUCTION

I N recent years, with the continuous development of big data and data warehouses [1]–[3], it is still a big challenge to analyze and process massive data. The C6 equivalence class contains 3 data cells, but the quotient cube needs to retain its upper and lower bounds when storing equivalence class, so its occupied storage space is 3 data cells, and its query capability is 3. Based on the observation of the structure of the reduced quotient cube, we conclude the calculation formula of the Ca value of equivalence classes, and through the statistical experiment on the distribution of the total value of Ca in the equivalence class of the quotient cube, the fact that a large number of equivalence classes with small cover capacity does exist inside the quotient cube is verified, and the result that the number of equivalence classes with small Ca values will increase sharply with the increase of dimension of data sets is discovered.

PRELIMINARY

COMPRESSION ALGORITHM

QUERY ALGORITHM

QUERY PERFORMANCE OF REDUCED QUOTIENT CUBE MODEL

CONCLUSION AND FUTURE WORK