Abstract

Due to the heterogeneous nature of XML data for internet applications exact matching of queries is often inadequate. The need arises to quickly identify subtrees of XML documents in a collection that are similar to a given pattern. Similarity involves both tags, that are not required to coincide, and structure, in which not all the relationships among nodes in the tree structure are strictly preserved. In this paper we present an efficient approach to the identification of similar subtrees, relying on ad-hoc indexing structures. The approach allows to quickly detect, in a heterogeneous document collection, the minimal portions that exhibit some similarity with the pattern. These candidate portions are then ranked according to their actual similarity. The approach supports different notions of similarity, thus it can be customized to different application domains. In the paper, three different similarity measures are proposed and compared. The approach is experimentally validated and the experimental results are extensively discussed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.