Big data anonymization using Spark for enhanced privacy protection

Abdelmadjid Guessoum Graba,Adil Toumouh

doi:10.11591/ijece.v14i4.pp4686-4696

Abdelmadjid Guessoum Graba, Adil Toumouh

Open Access

https://doi.org/10.11591/ijece.v14i4.pp4686-4696

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This article introduces an advanced solution for anonymizing large-scale sensitive data, addressing the limitations of traditional approaches when applied to vast datasets. By leveraging the Spark distributed computing framework, we propose a method that parallelizes the data anonymization process, enhancing efficiency and scalability. Utilizing Spark's resilient distributed datasets (RDD), the approach integrates two primary operations, Map_RDD and ReduceByKey_RDD, to execute the anonymization tasks. Our comprehensive experimental evaluation demonstrates our solution's effectiveness and improved performance in preserving data privacy while balancing data utility and confidentiality. A significant contribution of our study is the development of a wide array of solutions for data owners, particularly notable for a 500 MB dataset at an anonymity level of K=100, where our methodology produces 832 unique solutions. This study also opens avenues for future research in applying different privacy models within the Spark ecosystem, such as l-diversity and t-closeness.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Big data anonymization using Spark for enhanced privacy protection

Abstract

Published Version

Talk to us

Similar Papers

More From: International Journal of Electrical and Computer Engineering (IJECE)

Lead the way for us

Journal: International Journal of Electrical and Computer Engineering (IJECE)	Publication Date: Aug 1, 2024
License type: CC BY-SA 4.0

Similar Papers

A Multi-Level Distributed Computing Approach to XDraw Viewshed Analysis Using Apache Spark
Junduo Dong ... Jianbo Zhang
Remote Sensing | VOL. 15
Junduo Dong, et. al.Junduo Dong ... Jianbo Zhang
28 Jan 2023
Remote Sensing | VOL. 15

A Study of Big Data Analytics using Apache Spark with Python and Scala
Yogesh Kumar Gupta ... Surbhi Kumari
-
Yogesh Kumar Gupta, et. al.Yogesh Kumar Gupta ... Surbhi Kumari
03 Dec 2020
03 Dec 2020

A distributed computing model for big data anonymization in the networks.
Farough Ashkouti ... Keyhan Khamforoosh
PLOS ONE | VOL. 18
Farough Ashkouti, et. al.Farough Ashkouti ... Keyhan Khamforoosh
28 Apr 2023
PLOS ONE | VOL. 18

Dynamic Container-based Resource Management Framework of Spark Ecosystem
Nawab Muhammad Faseeh Qureshi ... Jaehyoun Kim
-
Nawab Muhammad Faseeh Qureshi, et. al.Nawab Muhammad Faseeh Qureshi ... Jaehyoun Kim
01 Feb 2019
01 Feb 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Big data anonymization using Spark for enhanced privacy protection

Abstract

Published Version

Talk to us

Similar Papers

More From: International Journal of Electrical and Computer Engineering (IJECE)