Abstract
The volume of XML data is tremendous in many areas, especially in data logging and scientific areas. XPath query is the core operation of XML process. It is a challenge to query massive XML data stored in a distributed manner. In this paper, we present an efficient distributed XPath query processing using MapReduce, which simultaneously processes queries for a massive volume of XML data. We first use virtual nodes to split the large scale XML data file into filesplits to the distributed storage system. Then we present the distributed XPath query algorithm to compute different fragments of the document tree in parallel using the MapReduce framework. Furthermore, in order to handle the large XML data efficiently, we build the partitional index and use random access mechanism to perform the query. The experimentation shows that our approach is efficient and scalable on this issue.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.