A MapReduce-Based Approach for Prefix-Based Labeling of Large XML Data

Jinhyun Ahn,Dong-Hyuk Im,Hong-Gee Kim

doi:10.1007/978-3-319-50112-3_7

Abstract

A massive amount of XML (Extensible Markup Language) data is available on the web, which can be viewed as tree data. One of the fundamental building blocks of information retrieval from tree data is answering structural queries. Various labeling schemes have been suggested for rapid structural query processing. We focus on the prefix-based labeling scheme that labels each node with a concatenation of its parent’s label and its child order. This scheme has been adapted in RDF (Resource Description Framework) data management systems that index RDF data in tree by grouping subjects. Recently, a MapReduce-based algorithm for the prefix-based labeling scheme was suggested. We observe that this algorithm fails to keep label size minimized, which makes the prefix-based labeling scheme difficult for massive real-world XML datasets. To address this issue, we propose a MapReduce-based algorithm for prefix-based labeling of XML data that reduces label size by adjusting the order of label assignments based on the structural information of the XML data. Experiments with real-world XML datasets show that the proposed approach is more effective than previous works.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A MapReduce-Based Approach for Prefix-Based Labeling of Large XML Data

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

GCM-Bench: A Benchmark for RDF Data Management System on Microorganism Data
Renfeng Liu ... Jungang Xu
-
Renfeng Liu, et. al.Renfeng Liu ... Jungang Xu
01 Jan 2019
01 Jan 2019

Storage, partitioning, indexing and retrieval in Big RDF frameworks: A survey
Tanvi Chawla ... M.C Govil
Computer Science Review | VOL. 38
Tanvi Chawla, et. al.Tanvi Chawla ... M.C Govil
15 Oct 2020
Computer Science Review | VOL. 38

Efficient Access Control of Large Scale RDF Data Using Prefix-Based Labeling
Jinhyun Ahn ... Dong-Hyuk Im
IEEE Access | VOL. 8
Jinhyun Ahn, et. al.Jinhyun Ahn ... Dong-Hyuk Im
01 Jan 2020
IEEE Access | VOL. 8

Even initial feedback vertex set problem is NP-complete
Dan A Simovici ... Gh Grigoras
Information Processing Letters | VOL. 8
Dan A Simovici, et. al.Dan A Simovici ... Gh Grigoras
01 Feb 1979
Information Processing Letters | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A MapReduce-Based Approach for Prefix-Based Labeling of Large XML Data

Abstract

Talk to us

Similar Papers