Modeling Semantic Heterogeneity in Dataspace: A Machine Learning Approach

Mrityunjay Singh,V.K Panchal,S.K Jain

doi:10.1109/icit.2014.24

Abstract

A data space system facilitates a new way for sharing and integrating the information among the various distributed, autonomous and heterogeneous data sources. To provide the best effort answer of a user query, a data space system needs to resolve the semantic heterogeneity in its core. There are many solutions being proposed to address this problem widely. We are exploring the problem of semantic heterogeneity in a data space system as a part of our PhD work. In this paper, we have addressed the semantic heterogeneity in the context of a data space system, and presented an abstract framework to model the semantic heterogeneity in data space. The proposed model is based on machine learning and ontology approaches. The machine learning technique analyzes the semantically equivalent data items (or entities) in data space, and the ontology conceptualizes the structural entities in a data space. This model resolves the semantic heterogeneity of a data space system, and creates a conceptual model using "from-data-to-schema" approach. The proposed approach implicitly creates the domain ontology by finding the most similar concepts comming from different data sources and enriches the performance of the system by finding the semantic relationships among them.

Full Text