Abstract

We propose a new approach to the problem of schema matching in relational databases that merges the hybrid and composite approach of combining multiple individual matching techniques. In particular, we propose assigning individual matchers to two categories, “strong” matchers that provide apriori higher quality matches, and “weak” matchers that may be more sensitive to the inputs and are less reliable but can still help generate some matches. Matching is correspondingly done in two phases, with strong “matches” being produced by strong matchers being combined using a simple voting combiner, and weak matchers providing additional evidence for attributes left unmatched (again using a voting combiner). We observe that, while many recent advances in schema matching [2][5][7][11] use composite schema matching and rely on the existence of training schemas to train combiners, in many real-world situations it is not feasible to employ learning techniques because of the unavailability of training data (i.e., schemas or instance data.) We hypothesize that “weak” matchers can often hurt overall accuracy if used in a “single-phase” composite matcher that does not employ learning techniques. We implement our two-stage approach in the ASID system and evaluate it using real life schemas. The experiments validate our hypothesis regarding the negative effect of “weak” matchers and also show ASID performs comparably to state of the art systems while requiring no training schemas. We also demonstrate the benefits of a simple documentation-based matcher. Our experimental data included schemas ranging from 20 to 120 attributes. Note that schemas with 120 attributes are as large or larger than other published evaluations of relational schema matching.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.