Handling distributed XML queries over large XML data based on MapReduce framework

Hongjie Fan,Zhiyi Ma,Dianhui Wang,Junfei Liu

doi:10.1016/j.ins.2018.04.028

Abstract

With the increase in available extensible markup language (XML) documents, numerous approaches to querying have been proposed in the literature. XPath queries and Twig pattern queries are the two basic approaches, directly affecting the efficiency of XML operations. Distributive manipulation of massive XML data is challenging. This paper aims to develop an efficient distributed XML query processing method using MapReduce, which simultaneously processes several queries on large volumes of XML data. First, we split up a large-scale XML data file into file-splits and put them in a distributed storage system. Then, we present an efficient algorithm to compute different fragments of the document tree using the MapReduce framework in parallel. In order to efficiently handle a large amount of XML data, we built a partition index and used a random access mechanism for specific queries. The experiment results show that our proposed approach is efficient with good scalability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Handling distributed XML queries over large XML data based on MapReduce framework

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Apr 11, 2018
Citations: 12

Similar Papers

TwigStack-MR: An Approach to Distributed XML Twig Query Using MapReduce
Hongjie Fan ... Zhiyi Ma
-
Hongjie Fan, et. al.Hongjie Fan ... Zhiyi Ma
01 Jun 2016
01 Jun 2016

Parallel labeling of massive XML data with MapReduce
Hyebong Choi ... Kyong-Ha Lee
The Journal of Supercomputing | VOL. 67
Hyebong Choi, et. al.Hyebong Choi ... Kyong-Ha Lee
29 Aug 2013
The Journal of Supercomputing | VOL. 67

XStorM: A Scalable Storage Mapping Scheme for XML Data
... Mong Li Lee
World Wide Web | VOL. 4
, et. al. ... Mong Li Lee
01 Jan 2001
World Wide Web | VOL. 4

Distributed XPath query processing over large XML data based on MapReduce framework
Hongjie Fan ... Junfei Liu
-
Hongjie Fan, et. al.Hongjie Fan ... Junfei Liu
01 Aug 2016
01 Aug 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Handling distributed XML queries over large XML data based on MapReduce framework

Abstract

Talk to us

Similar Papers

More From: Information Sciences