Abstract

XML, the Extensible Markup Language, is the standard exchange format for modern Information Systems, Service Oriented Architecture (SOA) and the Semantic Web. Hence, comparing XML documents has become a necessary task for tracking and merging changes between versions of the same document, or for translating between documents referring to the same information but complying with different schemata or originating from different parties. In this scenario, given two documents, XML differencing is the process of finding an edit sequence, namely a sequence of exact and approximate matching, deletion, and insertion operations, which, if applied to the first document will result in the second. In practice, domain-specific differencing solutions are expensive to develop, and hard to reuse. Therefore, a generic differencing approach, able to serve various domains, would be both useful and cost-effective. This thesis presents VTracker, a generic XML differencing approach, which is capable of capturing domain knowledge and semantics through a configurable domain-specific cost function. VTracker views an XML document as an ordered labeled tree. Given two XML-document trees and a cost function VTracker calculates the tree-edit distance needed to transform one tree to the other. The first contribution of VTracker is an automatic method used to synthesize such a cost function based on the domain’s XML Schema Definition (XSD). Second, VTracker considers the XML reference structure in addition to the natural XML containment structure. Third, VTracker implements an affine-cost policy that prefers edit operations applied to neighbors over dispersed elements. Finally, VTracker uses a set of simplicity heuristics to nominate the best edit script in case of multiple ones found with the same minimum cost. VTracker was applied to a variety of domains, namely OWL/RDF, WSDL, BPEL, UML/XMI, XHTML, and RNA secondary structure, where it performed competitively with, or even better than, state-of-the-art methods in each of these domains.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.