Statistics-based parallelization of XPath queries in shared memory systems

Rajesh Bordawekar,Anastasios Kementsietsidis,Lipyeow Lim,Bryant Wei-Lun Kok

doi:10.1145/1739041.1739063

Rajesh Bordawekar, Anastasios Kementsietsidis + Show 2 more

Open Access

PDF Available

https://doi.org/10.1145/1739041.1739063

Copy DOI

Export

Save

Cite

Publication Date: Mar 22, 2010

Citations: 23

Affiliation: University of Hawaiʻi at Mānoa

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

The wide availability of commodity multi-core systems presents an opportunity to address the latency issues that have plaqued XML query processing. However, simply executing multiple XML queries over multiple cores merely addresses the throughput issue: intra-query parallelization is needed to exploit multiple processing cores for better latency. Toward this effort, this paper investigates the parallelization of individual XPath queries over shared-address space multi-core processors. Much previous work on parallelizing XPath in a distributed setting failed to exploit the shared memory parallelism of multi-core systems. We propose a novel, end-to-end parallelization framework that determines the optimal way of parallelizing an XML query. This decision is based on a statistics-based approach that relies both on the query specifics and the data statistics. At each stage of the parallelization process, we evaluate three alternative approaches, namely, data-, query-, and hybrid-partitioning. For a given XPath query, our parallelization algorithm uses XML statistics to estimate the relative efficiencies of these different alternatives and find an optimal parallel XPath processing plan. Our experiments using well-known XML documents validate our parallel cost model and optimization framework, and demonstrate that it is possible to accelerate XPath processing using commodity multi-core systems.

Full Text