Distributed XPath query processing over large XML data based on MapReduce framework

Hongjie Fan,Dongsheng Wang,Junfei Liu

doi:10.1109/fskd.2016.7603390

Abstract

The volume of XML data is tremendous in many areas, especially in data logging and scientific areas. XPath query is the core operation of XML process. It is a challenge to query massive XML data stored in a distributed manner. In this paper, we present an efficient distributed XPath query processing using MapReduce, which simultaneously processes queries for a massive volume of XML data. We first use virtual nodes to split the large scale XML data file into filesplits to the distributed storage system. Then we present the distributed XPath query algorithm to compute different fragments of the document tree in parallel using the MapReduce framework. Furthermore, in order to handle the large XML data efficiently, we build the partitional index and use random access mechanism to perform the query. The experimentation shows that our approach is efficient and scalable on this issue.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distributed XPath query processing over large XML data based on MapReduce framework

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

TwigStack-MR: An Approach to Distributed XML Twig Query Using MapReduce
Hongjie Fan ... Zhiyi Ma
-
Hongjie Fan, et. al.Hongjie Fan ... Zhiyi Ma
01 Jun 2016
01 Jun 2016

Handling distributed XML queries over large XML data based on MapReduce framework
Hongjie Fan ... Junfei Liu
Information Sciences | VOL. 453
Hongjie Fan, et. al.Hongjie Fan ... Junfei Liu
11 Apr 2018
Information Sciences | VOL. 453

Efficient Query Processing for Large XML Data in Distributed Environments
Hiroto Kurita ... Jun Miyazaki
-
Hiroto Kurita, et. al.Hiroto Kurita ... Jun Miyazaki
01 May 2007
01 May 2007

Distributed Processing of XPath Queries Using MapReduce
Matthew Damigos ... Manolis Gergatsoulis
-
Matthew Damigos, et. al.Matthew Damigos ... Manolis Gergatsoulis
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed XPath query processing over large XML data based on MapReduce framework

Abstract

Talk to us

Similar Papers