Abstract

Objective: To address the modern privacy threats in data analytics by designing an efficient privacy preserving data analytics technique. Methods: The method applied is a non anonymized method that uses the concepts of synthesizing quasi identifiers and application of differential privacy. The proposed method was applied to three data sets viz. Adult data set, Statlogdata set and Indian Liver Patient data set. All the data sets are freely available in the UCI repository. Findings: The study presents “Synthesize Quasi Identifiers and apply Differential Privacy” (SQIDP) which is proved to be a more efficient and scalable algorithm. Compared to anonymity based algorithms SQIDP is not prone to similarity attacks, background knowledge attacks, attribute disclosure, and inference attacks. Anonymization, cryptographic, SWARM, and randomization methods will reduce data utility whereas SQIDP offers 100% data utility. Hence it is more efficient than other techniques. SQIDP was applied on three different data sets with 270, 583, and 48842 records but the execution time of the algorithm remained the same for all three data sets. SQIDP is proved to be a better privacy preservation technique with 100% data utility because it is not anonymized that abides by the recommendation in many privacy legislations like GDPR (General Data Protection Regulation) of the European Union and PDP (Personal Data Protection bill) of India. Keywords: Data privacy; privacy regulations; privacy preservation; synthetic data; differential

Highlights

  • The majority of the privacy preservation methods developed in the past were based on anonymization techniques which will reduce data utility [1]

  • We examined the key aspects of privacy legislations, modern privacy threats and proposed a privacy preservation algorithm called Synthesize Quasi Identifiers and apply Differential Privacy” (SQIDP) to offer privacy preservation in data analytics

  • 4 Results In SQIDP, the quasi identifiers were replaced with synthetic data generated using random variates having specified normal distribution

Read more

Summary

Introduction

The majority of the privacy preservation methods developed in the past were based on anonymization techniques which will reduce data utility [1]. Swarm based anonymization techniques for privacy preservation is a recent development in the field of study but privacy legislations recommend non-anonymization based solutions for ensuring data utility [2]. Application of K-anonymity together with perturbation techniques is studied but suffers from the data utility problem[3]. To achieve maximum data utility, a non anonymized solution is preferred. We examined the key aspects of privacy legislations, modern privacy threats and proposed a privacy preservation algorithm called SQIDP to offer privacy preservation in data analytics. The key features of the algorithm and the main contributions of the study are listed below

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call