Abstract

With XML becoming a ubiquitous language for data interoperability purposes in various domains, efficiently querying XML data is a critical issue. This has lead to the design of algebraic frameworks based on tree-shaped patterns akin to the tree-structured data model of XML. Tree patterns are graphic representations of queries over data trees. They are actually matched against an input data tree to answer a query. Since the turn of the 21st century, an astounding research effort has been focusing on tree pattern models and matching optimization (a primordial issue). This paper is a comprehensive survey of these topics, in which we outline and compare the various features of tree patterns. We also review and discuss the two main families of approaches for optimizing tree pattern matching, namely pattern tree minimization and holistic matching. We finally present actual tree pattern-based developments, to provide a global overview of this significant research topic.

Highlights

  • S INCE its inception in 1998, the eXtended Markup Language (XML) [1] has emerged as a standard for data representation and exchange over the Internet, as many communities adopted it for various purposes, e.g., mathematics with MathML [2], chemistry with CML [3], geography with GML [4] and e-learning with SCORM [5], just to name a few

  • We investigate the most prominent of tree pattern (TP) usages we found in the literature, which we classify by the means used to achieve the goals we have listed above, i.e., TP mining (Section 5.1), TP rewriting (Section 5.2) and extensions to matching (Section 5.3)

  • W E provide in this paper a comprehensive survey about XML tree patterns, which are nowadays considered crucial in XML querying and its optimization

Read more

Summary

Introduction

S INCE its inception in 1998, the eXtended Markup Language (XML) [1] has emerged as a standard for data representation and exchange over the Internet, as many (mostly scientific, but ) communities adopted it for various purposes, e.g., mathematics with MathML [2], chemistry with CML [3], geography with GML [4] and e-learning with SCORM [5], just to name a few. Evaluating path expressions in a treestructured data model such as XML’s is crucial for the overall performance of any query engine [10]. Initial efforts that mapped XML documents into relational databases queried with SQL [11], [12] induced costly table joins. Tree algebras provide a formal framework for query expression and optimization, in a way similar to relational algebra with respect to the SQL language [15]. The content of an XML document is encapsulated within elements that are defined by tags [1]. These elements can be seen as a hierarchy organized in a tree-like structure. Any element (and its contents) different from the root is termed an XML fragment. An XML fragment may be modeled as a finite rooted, labeled and ordered tree

Objectives
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call