Robust and scalable content-and-structure indexing

Kevin Wellenzohn,Michael H Böhlen,Stefano Zacchiroli,Sven Helmer,Antoine Pietri

doi:10.1007/s00778-022-00764-y

Abstract

Frequent queries on semi-structured hierarchical data are Content-and-Structure (CAS) queries that filter data items based on their location in the hierarchical structure and their value for some attribute. We propose the Robust and Scalable Content-and-Structure (RSCAS) index to efficiently answer CAS queries on big semi-structured data. To get an index that is robust against queries with varying selectivities, we introduce a novel dynamic interleaving that merges the path and value dimensions of composite keys in a balanced manner. We store interleaved keys in our trie-based RSCAS index, which efficiently supports a wide range of CAS queries, including queries with wildcards and descendant axes. We implement RSCAS as a log-structured merge tree to scale it to data-intensive applications with a high insertion rate. We illustrate RSCAS’s robustness and scalability by indexing data from the Software Heritage (SWH) archive, which is the world’s largest, publicly available source code archive.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The VLDB Journal	Publication Date: Oct 15, 2022
Citations: 3	License type: open-access

R Discovery Prime

R Discovery Prime

Robust and scalable content-and-structure indexing

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal

Lead the way for us

Similar Papers

Dynamic interleaving of content and structure for robust indexing of semi-structured hierarchical data
Kevin Wellenzohn ... Sven Helmer
Proceedings of the VLDB Endowment | VOL. 13
Kevin Wellenzohn, et. al.Kevin Wellenzohn ... Sven Helmer
01 Jun 2020
Proceedings of the VLDB Endowment | VOL. 13

Big Data Modelling for Predicting Side-Effects of Anticancer Drugs: A Comprehensive Approach
Sai Jyothi Bolla ... S Jyothi
-
Sai Jyothi Bolla, et. al.Sai Jyothi Bolla ... S Jyothi
24 Aug 2019
24 Aug 2019

Inserting Keys into the Robust Content-and-Structure (RCAS) Index
Kevin Wellenzohn ... Luka Popovic
-
Kevin Wellenzohn, et. al.Kevin Wellenzohn ... Luka Popovic
01 Jan 2020
01 Jan 2020

The Software Heritage Graph Dataset: Public Software Development Under One Roof
Antoine Pietri ... Diomidis Spinellis
-
Antoine Pietri, et. al.Antoine Pietri ... Diomidis Spinellis
01 May 2019
01 May 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust and scalable content-and-structure indexing

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal