Efficient Preprocesses for Fast Storage and Query Retrieval in Native XML Database

Haw Su-Cheng,Lee Chien-Sing

doi:10.4103/0256-4602.48466

Abstract

XML (extensible mark-up language) has emerged as one of the popular data representation standards for information storage and exchange. In this paper, we propose an extended INLAB architecture, INLAB2, focusing on preprocessing the XML document for fast native storage and accurate query retrieval. Firstly, we propose our xParse parser to check the well-formedness of an XML document. Next, we use a ( self-end) labeling scheme to encode each element in the XML database, by its positional information, to establish parent-child (P-C) or ancestor-descendant (A-D) relationships between nodes. Subsequently, our TwigINLAB2 algorithm is used to optimize query retrieval. TwigINLAB2 is a generalization of TwigStack, the stack-based algorithm for matching twig query. However, the TwigStack algorithm is efficient for A-D relationship queries only. Thus, in order to overcome this limitation, we enhance query retrieval by utilizing indices to speed up the matching and merging phases. Experimental results indicate that TwigINLAB2 can, on an average, process twig queries 23% better than the TwigStack algorithm and 10% better than TwigINLAB1, in terms of execution time.

Full Text