Multidimensional data analysis has attracted a lot of research efforts during past years. One of the aspects that has been addressed so far is that to allow users to analyze their data from different perspectives, each of which corresponds to a selected subset of dimensions. To optimize these analysis queries, precomputation, and materialization, are among most studied solutions. In the context of skyline analysis, the skycube structure has been proposed as an optimization structure to allow users to ask for the non dominated records with respect to every selected dimensions set. More precisely, given a set of dimensions D={D1,…,Dd} and a relation T(id,D), the Skycube of T is the set of all skylines obtained by considering each of the subsets of D (subspaces). To make the Skycube practically useful, two lines of research have been pursued so far: the first one aims to propose efficient algorithms for computing it. Note that the number of these skylines is exponential w.r.t. |D|. Hence, both execution time and storage space make these solutions struggling with even moderately large datasets, say |D| larger than 10 and the number of tuples greater than 106 . This motivated the second line of researches which propose Skycube summarization techniques to reduce both time and space consumption. Both lines of research, store the whole or a summary of the following information: “for every tuple t, keep track of the dimensions subsets X (subspaces) where t belongs to the respective skyline”. In this paper, we consider the complementary statement, i.e., “for every tuple t, we store a compact data structure encoding the subspaces X with respect to which, tis dominated”. This is what we call the negative skycube. Despite the apparent equivalence between the two statements (dominated vs not dominated), our analysis and extensive experiments show that these two points of view do not lead to the same behavior of the related algorithms. More specifically, our proposal shows that: (i) the negative summary can be obtained much faster than state of the art techniques for positive summaries, (ii) in general, it consumes less memory space, (iii) skyline queries evaluation using this summary is much faster, (iv) the positive Skycube can be obtained more rapidly than state of the art algorithms especially designed for this purpose, and (v) it is highly effective with respect to insertions and deletions.
Read full abstract