Abstract

AbstractAttribute-level schema matching is a critical step in numerous database applications, such as DataSpaces, Ontology Merging and Schema Integration. There exist many researches on this topic, however, they ignore the implicit categorical information which is crucial to find high-quality matches between schema attributes. In this paper, we discover the categorical semantics implicit in source instances, and associate them with the matches in order to improve overall quality of schema matching. Our method works in three phases. The first phase is a pre-detecting step that detects the possible categories of source instances by using clustering techniques. In the second phase, we employ information entropy to find the attributes whose instances imply the categorical semantics. In the third phase, we introduce a new concept c-mapping to represent the associations between the matches and the categorical semantics. Then, we employ an adaptive scoring function to evaluate the c-mappings to achieve the task of associating the matches with the semantics. Moreover, we show how to translate the matches with semantics into schema mapping expressions, and use the chase procedure to transform source data into target schemas. An experimental study shows that our approach is effective and has good performance.KeywordsCluster TechniqueMinimum Span TreeCategorical SemanticTarget AttributeCategorical AttributeThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.