Semantic Integration of Open-Data Tables

Asha Subramanian,Srinath Srinivasa,Janaki Vinesh Joshi,Ved Kurien Mathai,Vikkurthi Manikanta

doi:10.1007/978-3-319-48472-3_35

Abstract

With vast amounts of tabular data freely available under several Open-Data initiatives, semantic integration of such datasets is a pressing need. Multiple research efforts have addressed the problem of annotating tabular data. However, to the best of our knowledge, they do not adequately address the problem of semantic integration of tables. A given collection of tables can be semantically integrated along several perspectives or themes. This makes semantic integration a “divergent aggregation” problem. Most existing approaches have focused on interpreting a single table, or rewriting tables to describe an overarching theme that is already provided. In this work, we address semantic integration along two levels: Theme identification (identifying dominant topics or perspectives through which the data can be characterized) and Schematic characterization (classes, relationships and instances that best characterize the data within the theme). The theme need not be represented by a single column, and may span across multiple columns or tables. We use Linked Open data (LOD) cloud to map ontologies that best suit the datasets. Our work also identifies incoherent datasets where a given collection may not have common topics. In such cases we are able to provide guidance on the intersection of semantic footprints of the tables for a judicious selection of the datasets for semantic integration.

Full Text