Structures, Semantics and Statistics

Alon Y Halevy

doi:10.1016/b978-012088469-8.50003-6

Abstract

This chapter discusses data integration and its challenges. Data integration is a pervasive challenge faced in data management applications. It is crucial in large enterprises that own a multitude of data sources, for progress in large-scale scientific projects, where data sets are being produced independently by multiple researchers. At a fundamental level, the key challenge in data integration is to reconcile the semantics of disparate data sets, each expressed with a different database structure. Computing statistics over a large number of structures offers a powerful methodology for producing semantic mappings, the expressions that specify such reconciliation. The statistics offer hints about the semantics of the symbols in the structures, thereby enabling the detection of semantically similar concepts. The same methodology can be applied to several other data management tasks that involve search in a space of complex structures and in enabling the next generation on-the-fly data integration systems.

Full Text