A new effective method for labeling dynamic XML data

Eynollah Khanjari,Leila Gaeini

doi:10.1186/s40537-018-0161-4

Abstract

Query processing based on labeling dynamic XML documents has gained more attention in the past several years. An efficient labeling scheme should provide small size labels keeping the simplicity of the exploited algorithm in order to avoid complex computations as well as retaining the readability of structural relationships between nodes. Moreover, for dynamic XML data, relabeling the nodes in XML updates should be avoided. However, the existing schemes lack the capability of supporting all of these requirements. In this paper, we propose a new labeling scheme which assigns variable-length labels to nodes in dynamic XML documents. Our method employs the FibLSS encoding scheme that exploits the properties of the Fibonacci sequence to provide variable-length node labels of appropriate size. In XML updating process, we add a new section only in the new node’s label without relabeling the existing nodes while keeping the order of nodes as well as preserving the structural relationships. Our labeling method is scalable as it is not subject to overflow, and as the number of nodes to be labeled increases exponentially, the size of labels grows linearly, which makes it suitable for big datasets. It also has the best performance in computational processing costs compared to existing approaches. The results of the experiments confirm the advantages of our proposed method in comparison to state-of-the-art techniques.

Highlights

XML is a semi-structural and standard document format to exchange data
Node labeling in XML data is one way to increase the efficiency of query processing
An XML labeling method should support the structural relationships among nodes as well as avoid relabeling any existing node and keep the order of nodes when new nodes are inserted into the XML tree

Summary

Introduction

XML is a semi-structural and standard document format to exchange data. Elements in XML documents are regular and there are structural relationships between them [1]. Label size analysis To label the nodes at each level of the XML tree, the number of bits used starts from 1, i.e. the length field. To assign the unique identifiers to N nodes each with an appropriate length, i.e. at least 1 bit and at most n bits, the total storage requirement of the labeling scheme is Nlog2N+2 + 2log2N+2 − 2N − 2 bits Note that, this is the total storage which is needed in the worst case. We label the nodes of a given level from left to right using appropriate label sizes starting from 1 bit, up to whatever is needed. After inserting a node and updating the XML tree, we add the UpID part to the node’s label, just after its SelfID.

Results and discussion

Label format

Number of retrieved nodes