Efficient Provenance Management via Clustering and Hybrid Storage in Big Data Environments

Die Hu,Gongming Xu,Dan Feng,Xinrui Gu,Yulai Xie,Darrell Long

doi:10.1109/tbdata.2019.2907116

Abstract

Provenance is a type of metadata that records the creation and transformation of data objects. It has been applied to a wide variety of areas such as security, search, and experimental documentation. However, provenance usually has a vast amount of data with its rapid growth rate which hinders the effective extraction and application of provenance. This paper proposes an efficient provenance management system via clustering and hybrid storage. Specifically, we propose a Provenance-Based Label Propagation Algorithm which is able to regularize and cluster a large number of irregular provenance. Then, we use separate physical storage mediums, such as SSD and HDD, to store hot and cold data separately, and implement a hot/cold scheduling scheme which can update and schedule data between them automatically. Besides, we implement a feedback mechanism which can locate and compress the rarely used cold data according to the query request. The experimental test shows that the system can significantly improve provenance query performance with a small run-time overhead.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Provenance Management via Clustering and Hybrid Storage in Big Data Environments

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Big Data

Lead the way for us

Journal: IEEE Transactions on Big Data	Publication Date: Dec 1, 2020
Citations: 46

Similar Papers

The Design of Intelligent Transportation Video Processing System in Big Data Environment
Qian Hao ... Lele Qin
IEEE Access | VOL. 8
Qian Hao, et. al.Qian Hao ... Lele Qin
01 Jan 2020
IEEE Access | VOL. 8

Rapid growth of thoracic aortic aneurysm: Reality or myth?
Alexandra Sonsino ... John A Elefteriades
The Journal of Thoracic and Cardiovascular Surgery | VOL. 167
Alexandra Sonsino, et. al.Alexandra Sonsino ... John A Elefteriades
12 Jul 2022
The Journal of Thoracic and Cardiovascular Surgery | VOL. 167

Strategic Utilization of Management Information Systems for Efficient Organizational Management in the Age of Big Data
Aliyu Mohammed
Computer Applications: An International Journal | VOL. 10
Aliyu MohammedAliyu Mohammed
29 Nov 2023
Computer Applications: An International Journal | VOL. 10

Application Research of Key Frames Extraction Technology Combined with Optimized Faster R-CNN Algorithm in Traffic Video Analysis
Zhi-Guang Jiang ... Xiao-Tian Shi
Complexity | VOL. 2021
Zhi-Guang Jiang, et. al.Zhi-Guang Jiang ... Xiao-Tian Shi
02 Feb 2021
Complexity | VOL. 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Provenance Management via Clustering and Hybrid Storage in Big Data Environments

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Big Data