Storage and Querying of Large Provenance Graphs Using NoSQL DSE

Andrii Kashliev

doi:10.1109/bigdatasecurity-hpsc-ids49724.2020.00054

Abstract

Provenance metadata captures history of derivation of an entity, such as a dataset obtained through numerous data transformations. It is of great importance for science, among other fields, as it enables reproducibility and greater intelligibility of research results. With the avalanche of provenance produced by today’s society, there is a pressing need for storing and low-latency querying of large provenance graphs. To address this need, in this paper we present a scalable approach to storing and querying provenance graphs using a popular NoSQL column family database system called DataStax Enterprise (DSE). Specifically, we i) propose a storage scheme, including two novel indices that enable efficient traversal of provenance graphs along causality lines, ii) present an algorithm for building our proposed indices for a given provenance graph, iii) implement our algorithm and conduct a performance study in which we store and query a provenance graph with over five million vertices using a DSE cluster running in AWS cloud. Our performance study results further validate scalability and performance efficiency of our approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Storage and Querying of Large Provenance Graphs Using NoSQL DSE

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A PT-based approach to construct efficient provenance graph for threat alert investigation
Shiming Men ... Mai Ye
ITM Web of Conferences | VOL. 60
Shiming Men, et. al.Shiming Men ... Mai Ye
01 Jan 2024
ITM Web of Conferences | VOL. 60

Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization
Hui Miao ... Amol Deshpande
-
Hui Miao, et. al.Hui Miao ... Amol Deshpande
01 Apr 2019
01 Apr 2019

SubISO: A Scalable and Novel Approach for Subgraph Isomorphism Search in Large Graph
Muhammad Abulaish ... Jahiruddin
-
Muhammad Abulaish, et. al.Muhammad Abulaish ... Jahiruddin
01 Jan 2019
01 Jan 2019

VinciDecoder: Automatically Interpreting Provenance Graphs into Textual Forensic Reports with Application to OpenStack
Azadeh Tabiban ... Heyang Zhao
-
Azadeh Tabiban, et. al.Azadeh Tabiban ... Heyang Zhao
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Storage and Querying of Large Provenance Graphs Using NoSQL DSE

Abstract

Talk to us

Similar Papers