A characterization of hierarchical computable distance functions for data warehouse systems

Matteo Golfarelli,Elisa Turricchia

doi:10.1016/j.dss.2014.03.011

Abstract

A data warehouse is a huge multidimensional repository used for statistical analysis of historical data. In a data warehouse events are modeled as multidimensional cubes where cells store numerical indicators while dimensions describe the events from different points of view. Dimensions are typically described at different levels of details through hierarchies of concepts. Computing the distance/similarity between two cells has several applications in this domain. In this context distance is typically based on the least common ancestor between attribute values, but the effectiveness of such distance functions varies according to the structure and to the number of the involved hierarchies. In this paper we propose a characterization of hierarchy types based on their structure and expressiveness, we provide a characterization of the different types of distance functions and we verify their effectiveness on different types of hierarchies in terms of their intrinsic discriminant capacity.

Full Text