A dimension in a data warehouse (DW) is a set of elements connected by a hierarchical relationship. The elements are used to view summaries of data at different levels of abstraction. In order to support an efficient processing of such summaries, a dimension is usually required to satisfy different classes of integrity constraints. In scenarios where the constraints properly capture the semantics of the DW data, but they are not satisfied by the dimension, the problem of repairing (correcting) the dimension arises. In this paper, we study the problem of repairing a dimension in the context of two main classes of integrity constraints: strictness and covering constraints. We introduce the notion of minimal repair of a dimension: a new dimension that is consistent with respect to the set of integrity constraints, which is obtained by applying a minimal number of updates to the original dimension. We study the complexity of obtaining minimal repairs, and show how they can be characterized using Datalog programs with weak constraints under the stable model semantics.
Read full abstract