Abstract
Schema matching exists as a long-standing challenge in many database related applications, such as data integration, where two databases with different schema have to be integrated. With the evolvement from database to big data, the schema matching has been enriched with various purposes and application contexts, ranging from data integration, to service integration, to semantic data clouding, until more recent exploratory data analysis over big data. These enriched contexts increase the demand for schema matching between semantic data-types, such as XML, RDF etc. The existing integration approaches have not dealt with the challenges of defining a relation between XML and other semantic data-types. To address these challenges, this paper studies the problem of schema mapping from XML to RDF in two folds. Firstly, testify the validity of single matcher in a column based manner for the semantic data types. Secondly, testify the validity of a highly configurable framework that utilizes hierarchical classification in order to construct a composable pipeline. We propose and implement a Reconfigurable pipeline for Semi-Automatic Schema Matching (REPSASM), which aims to solve the customizability of the matching problem by providing an environment in which a user can create, configure and experiment with their own schema-matching procedure. The experiments performed within this work show that the configurability and hierarchical classification improves the matching result, and it proposes an algorithm to automatically optimize such a hierarchy pipeline.
Highlights
Schema matching exists as a principle problem in many database related applications, such as data integration, where two databases with different schema have to be integrated
These enriched contexts increase the demand for schema matching between semantic data-types, such as XML, RDF etc
We propose and implement an reconfigurable pipeline for Semi-Automatic Schema Matching (REPSASM), in this context as a chain of matchers that is used to classify data
Summary
Schema matching exists as a long-standing challenge in many database related applications, such as data integration, where two databases with different schema have to be integrated. With the evolvement from database to big data, the schema matching has been enriched with various purposes and application contexts, ranging from data integration, to service integration, to semantic data clouding, until more recent exploratory data analysis over big data. These enriched contexts increase the demand for schema matching between semantic data-types, such as XML, RDF etc. The existing integration approaches have not dealt with the challenges of defining a relation between XML and other semantic data-types To address these challenges, this paper studies the problem of schema mapping from XML to RDF in two folds. The experiments performed within this work show that the configurability and hierarchical classification improves the matching result, and it proposes an algorithm to automatically optimize such a hierarchy pipeline
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.