Characterizing schema mappings via data examples

Balder TEN Cate,Phokion G. Kolaitis,Bogdan Alexe,Wang-Chiew Tan

doi:10.1145/2043652.2043656

Abstract

Schema mappings are high-level specifications that describe the relationship between two database schemas; they are considered to be the essential building blocks in data exchange and data integration, and have been the object of extensive research investigations. Since in real-life applications schema mappings can be quite complex, it is important to develop methods and tools for understanding, explaining, and refining schema mappings. A promising approach to this effect is to use “good” data examples that illustrate the schema mapping at hand. We develop a foundation for the systematic investigation of data examples and obtain a number of results on both the capabilities and the limitations of data examples in explaining and understanding schema mappings. We focus on schema mappings specified by source-to-target tuple generating dependencies (s-t tgds) and investigate the following problem: which classes of s-t tgds can be “uniquely characterized” by a finite set of data examples? Our investigation begins by considering finite sets of positive and negative examples, which are arguably the most natural choice of data examples. However, we show that they are not powerful enough to yield interesting unique characterizations. We then consider finite sets of universal examples, where a universal example is a pair consisting of a source instance and a universal solution for that source instance. We first show that unique characterizations via universal examples is, in a precise sense, equivalent to the existence of Armstrong bases (a relaxation of the classical notion of Armstrong databases). After this, we show that every schema mapping specified by LAV s-t tgds is uniquely characterized by a finite set of universal examples with respect to the class of LAV s-t tgds. Moreover, this positive result extends to the much broader classes of n -modular schema mappings, n a positive integer. Finally, we study the unique characterizability of GAV schema mappings. It turns out that some GAV schema mappings are uniquely characterizable by a finite set of universal examples with respect to the class of GAV s-t tgds, while others are not. By unveiling a tight connection with homomorphism dualities, we establish an effective, sound, and complete criterion for determining whether or not a GAV schema mapping is uniquely characterizable by a finite set of universal examples with respect to the class of GAV s-t tgds.

Full Text