Abstract

With data becoming a salient asset worldwide, dependence amongst data kept on growing. Hence the real-world datasets that one works upon in today’s time are highly correlated. Since the past few years, researchers have given attention to this aspect of data privacy and found a correlation among data. The existing data privacy guarantees cannot assure the expected data privacy algorithms. The privacy guarantees provided by existing algorithms were enough when there existed no relation between data in the datasets. Hence, by keeping the existence of data correlation into account, there is a dire need to reconsider the privacy algorithms. Some of the research has considered utilizing a well-known machine learning concept, i.e., Data Correlation Analysis, to understand the relationship between data in a better way. This concept has given some promising results as well. Though it is still concise, the researchers did a considerable amount of research on correlated data privacy. Researchers have provided solutions using probabilistic models, behavioral analysis, sensitivity analysis, information theory models, statistical correlation analysis, exhaustive combination analysis, temporal privacy leakages, and weighted hierarchical graphs. Nevertheless, researchers are doing work upon the real-world datasets that are often large (technologically termed big data) and house a high amount of data correlation. Firstly, the data correlation in big data must be studied. Researchers are exploring different analysis techniques to find the best suitable. Then, they might suggest a measure to guarantee privacy for correlated big data. This survey paper presents a detailed survey of the methods proposed by different researchers to deal with the problem of correlated data privacy and correlated big data privacy and highlights the future scope in this area. The quantitative analysis of the reviewed articles suggests that data correlation is a significant threat to data privacy. This threat further gets magnified with big data. While considering and analyzing data correlation, then parameters such as Maximum queries executed, Mean average error values show better results when compared with other methods. Hence, there is a grave need to understand and propose solutions for correlated big data privacy.

Highlights

  • Data Privacy is the appropriate use of data available with any individual or organization, unlike data security that guarantees confidentiality, integrity, and data availability

  • This paper describes the structure and organization of big data, which is fundamental to big data privacy and correlated big data privacy

  • This paper presents a review of all such works that identified data correlation as a privacy threat and tried to maintain data privacy guarantee by considering data correlation as an inherent property of real-world datasets

Read more

Summary

Introduction

Data Privacy is the appropriate use of data available with any individual or organization, unlike data security that guarantees confidentiality, integrity, and data availability. Despite introducing a larger value of k, there may be cases where the sensitive data in the equivalence class do not exhibit diversity. T-closeness [4] is an improved version of l-diversity where it ensures that the distance between the distribution of a sensitive attribute in the equivalence class and the distribution of the attribute in the whole table is not more than a threshold value t. Those mentioned above could prevent distribution attacks on datasets to a large extent. Along with some other drawbacks, S. no Privacy measure Definition

Objectives
Methods
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.