Abstract

Nowadays, big data has become a hot research topic. It gives fresh impetus to the economic and social development. However, the huge value of big data also makes it the focus of attacks. Big data security incidents occur frequently in recent years. The security supervision capacities for big data do not match its important role. Data provenance which describes the origins of data and the process by which it arrived the current state, is an effective approach for data supervision. For the full use of provenance in big data supervision, a provenance model which defines the concepts used to represent the provenance types and relations is required to be built in advance, but current provenance models do not adapt to big data scenarios well. In this paper, we comprehensively consider the characteristics of big data and the requirements of data security supervision, extend the widely used provenance model PROV-DM by subtyping and new relation definition, and propose a big data provenance model (BDPM) for data supervision. BDPM model supports the provenance representation of various data types and diverse data processing modes to represent the entire data transformation process through different components in the big data system, and defines new relations to enrich provenance analysis functions. Based on BDPM model, we introduce the constraints that should be satisfied in the construction of valid provenance graph and present the data security supervision methods via provenance graph analysis. Finally, we evaluated the satisfiability of BDPM model through a case study.

Highlights

  • With the advent of big data era, data has become a kind of basic production factors as important as physical assets and human capital

  • (2) We propose a big data provenance model BDPM for data security supervision based on PROV-DM model

  • In this paper, considering the big data characteristics and data security supervision requirements, we propose a big data provenance model BDPM for data security supervision based on PROV-DM model

Read more

Summary

INTRODUCTION

With the advent of big data era, data has become a kind of basic production factors as important as physical assets and human capital. (1) We analyze the big data characteristics [9], [10] and typical big data system technology frameworks [11], [12], review the research related to OPM and PROV-DM, and present the requirements for building a big data provenance model. In this paper, preserving the PROV-DM core structure and partial extended structure, we further extend PROV-DM model via subtyping and new relation definition according to the big data characteristics and security supervision requirements. A malicious user can steal HDFS data directly from underlying LFS without leaving traces in the audit or monitoring information of HDFS [25] It is necessary for the provenance model to express the relationship of related data between different data organization systems so as to realize the joint analysis of operations on them to detect possible abnormal operations. While providing sufficient semantic information, the model should keep concise to reduce the overhead of provenance data collection and storage, and reduce the complexity of provenance data analysis

BDPM MODEL
DISCUSSION
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call