A Prufer Sequence Based Approach to Measure Structural Similarity of XML Documents

Ramanathan Periakaruppan,Rethinaswamy Nadarajan

doi:10.1007/978-3-642-41033-8_81

Abstract

XML is a W3C standard for exchange of semi-structured data. For many applications it is necessary to extract information from semi-structured data which is a complex task. In this paper we address the problem of computing structural similarity of XML documents which play a crucial role in clustering process. Previous works on path based approach fail to capture the sibling relationship among the nodes and also ignore the similarity when the nodes in the paths to be matched, are not in the same order but still convey same semantics. Another weakness of this approach is in the case of the partial path match, is that the level information is not taken into account when the nodes to be compared appear in different hierarchical level. To address these issues, we describe a method based on Prufer Sequence for measuring the structural similarity of XML documents, in this paper. Benefit of Prufer sequence based representation is that, it stores the ancestor-descendant and sibling relation. XML trees are encoded based on Prufer sequence which establishes a one-to-one mapping between XML tree and sequence. Instead of extracting all paths only common nodes are extracted based on Prufer sequence code. We have devised an algorithm to compute similarity by exploring all relations among the common nodes namely parent-child, ancestor-descendant and sibling. The experimental results show that the proposed approach is effective.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Prufer Sequence Based Approach to Measure Structural Similarity of XML Documents

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Persistent Labeling Scheme for Dynamic Ordered XML Trees
Aye Khaing ... Ni Thein
-
Aye Khaing, et. al.Aye Khaing ... Ni Thein
01 Dec 2006
01 Dec 2006

Transformation of XML Data Sources for Sequential Path Mining
Ruth Mcnerlan ... Guoze Zhao
-
Ruth Mcnerlan, et. al.Ruth Mcnerlan ... Guoze Zhao
01 Jan 2017
01 Jan 2017

An Evaluation of Similarity Search Methods Blending Structures and Keywords in XML Documents
Apichaya Auvattanasombat ... Haruo Yokota
-
Apichaya Auvattanasombat, et. al.Apichaya Auvattanasombat ... Haruo Yokota
02 Dec 2013
02 Dec 2013

IMBBTC: XML Document Indexing Model Based on Binary Tree Coding
Zhixin Hu
-
Zhixin HuZhixin Hu
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Prufer Sequence Based Approach to Measure Structural Similarity of XML Documents

Abstract

Talk to us

Similar Papers