Solving the Intractable Problem: Optimal Performance for Worst Case Scenarios in XML Twig Pattern Matching

Shtwai Alsubai,Siobhan North

doi:10.1109/access.2020.3033008

Shtwai Alsubai, Siobhan North

Open Access

https://doi.org/10.1109/access.2020.3033008

Copy DOI

Abstract

In the history of databases, eXtensible Markup Language (XML) has been thought of as the standard format to store and exchange semi-structured data. With the advent of IoT, XML technologies can play an important role in addressing the issue of processing a massive amount of data generated from heterogeneous devices. As the number and complexity of such datasets increases there is a need for algorithms which are able to index and retrieve XML data efficiently even for complex queries. In this context twig pattern matching, finding all occurrences of a twig pattern query (TPQ), is a core operation in XML query processing. Until now holistic joins have been considered the state-of-the-art TPQ processing algorithms, but they fail to guarantee an optimal evaluation except at the expense of excessive storage costs which limit their scope in large datasets. In this article, we introduce a new approach which significantly outperforms earlier methods in terms of both the size of the intermediate storage and query running time. The approach presented here uses Child Prime Labels (Alsubai & North, 2018) to improve the filtering phase of bottom-up twig matching algorithms and a novel algorithm which avoids the use of stacks, thus improving TPQs processing efficiency. Several experiments were conducted on common benchmarks such as DBLP, XMark and TreeBank datasets to study the performance of the new approach. Multiple analyses on a range of twig pattern queries are presented to demonstrate the statistical significance of the improvements.

Highlights

XML technology has emerged as the de facto standard for storage of semi-structure data and for data exchange in e-business [19]
A set of novel bottom-up holistic twig matching algorithms which are based on a new advanced preorder filtering function which has the ability to preserve the document order, unlike previous filtering strategies, such as [30], [32], and filter out irrelevant elements when P-C relationships are invloved in Twig Pattern Query (TPQ)
We have presented new approaches that use the Child Prime Label (CPL) indexing to improve filtering phase of bottom-up twig matching algorithms

Summary

INTRODUCTION

XML technology has emerged as the de facto standard for storage of semi-structure data and for data exchange in e-business [19]. The Child Prime Label (CPL) algorithm is an extension of the getNext() core function in the classical holistic twig joins algorithm, TwigStack [10] This new filtering function can filter out irrelevant elements efficiently without either violating the document order or consuming additional space. A set of novel bottom-up holistic twig matching algorithms which are based on a new advanced preorder filtering function which has the ability to preserve the document order, unlike previous filtering strategies, such as [30], [32], and filter out irrelevant elements when P-C relationships are invloved in TPQs. Full proofs of correctness for the algorithms necessary to evaluate subsets of TPQs containing P-C and A-D axes are provided as well.

RELATED WORK

OPTIMAL TWIG JOINS

TwigPrime

EXPERIMENTAL EVALUATION

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2020
Citations: 38	License type: cc-by

R Discovery Prime

R Discovery Prime

Solving the Intractable Problem: Optimal Performance for Worst Case Scenarios in XML Twig Pattern Matching

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

FTwig: Efficient algorithm for processing fuzzy XML twig pattern matching
Jian Liu ... Li Yan
-
Jian Liu, et. al.Jian Liu ... Li Yan
01 Aug 2010
01 Aug 2010

A Prime Number Approach to Matching an XML Twig Pattern including Parent-Child Edges
Shtwai Alsubai ... Siobhán North
-
Shtwai Alsubai, et. al.Shtwai Alsubai ... Siobhán North
01 Jan 2017
01 Jan 2017

TP+Output: Modeling Complex Output Information in XML Twig Pattern Query
Huayu Wu ... Tok Wang Ling
-
Huayu Wu, et. al.Huayu Wu ... Tok Wang Ling
01 Jan 2009
01 Jan 2009

Integration of Relational and Native Approaches to XML Query Processing
Huayu Wu ... Tok Wang Ling
-
Huayu Wu, et. al.Huayu Wu ... Tok Wang Ling
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Solving the Intractable Problem: Optimal Performance for Worst Case Scenarios in XML Twig Pattern Matching

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions