The Lattice Histogram is a recently proposed data summarization technique that achieves approximation quality preferable to that of an optimal plain histogram. Like other hierarchical synopsis methods, a lattice histogram (LH) aims to approximate data using a hierarchical structure. Still, this structure is not defined a priori; it consists an unknown , not a given, of the problem. Past work has defined the properties that an LH needs to obey and developed general-purpose approximation algorithms for the construction thereof. Still, two major issues remain unaddressed: First, the construction of an optimal LH for a given error metric is a problem unsolved to date. Second, the proposed algorithms suffer from too high space and time complexities that render their application in real-world settings problematic. In this paper, we address both these questions, focusing on the case that the target error metric is a maximum error metric. Our algorithms treat both the error-bounded LH construction problem, in which the space occupied by an LH is minimized under an error constraint, as well as the classic space-bounded problem. First, we develop a dynamic-programming scheme that detects an optimal LH under a given maximum-error bound. Second, we propose an efficient, practical, greedy algorithm that solves the same problem with much lower time and space requirements. Then, we show how both our algorithms can be applied to the classic space-bounded problem, aiming at minimizing error under a bound on space. Our experimental study with real-world data sets shows the effectiveness of our methods compared to competing summarization techniques. Moreover, our findings show that our greedy heuristic performs almost as well as the optimal solution in terms of accuracy.
Read full abstract