QMatch - A Hybrid Match Algorithm for XML Schemas

K.T Claypool,V Hegde,N Tansalarak

doi:10.1109/icde.2005.272

Abstract

Integration of multiple heterogeneous data sources continues to be a critical problem for many application domains and a challenge for researchers world-wide. With the increasing popularity of the XML model and the proliferation of XML documents on-line, automated matching of XML documents and databases has become a critical problem. In this paper, we present a hybrid schema match algorithm - QMatch - that provides a unique framework for combining existing structural and linguistic algorithms while exploiting additional information inherent in XML documents such as the order of XML elements to provide improved levels of matching between two given XML Schemas. QMatch is based on an extension of our previous work on QoM, a Quality of Match metric that measures the "goodness" of a match of two UML-based schemas. We now extend the QoM taxonomy to encapsulate the richness of information captured in XML Schemas and provide a qualitative and quantitative analysis of the information capacity of XML Schemas. QoM provides not only a means of tuning existing schema match algorithms to output at desired levels of matching but also provides an effective basis for a new schema match algorithm. In this paper we show via a set of experiments the benefits of QMatch over using individual structural and linguistic algorithms for schema matching, and provide an empirical measure of the accuracy of QMatch in terms of the true positives discovered by the algorithm.

Full Text