Rapid Re-Identification Risk Assessment for Anonymous Data Set in Mobile Multimedia Scene

Zhigang Yang,Yu Xiong,Daizhong Luo,Ruyan Wang

doi:10.1109/access.2020.2977404

Abstract

Ubiquitous mobile multimedia applications bring great convenience to users. However, when enjoying mobile multimedia services, users provide personal data to service platforms. Although the service platforms always claim that the collected personal data are de-identified, the risk of re-identifying users through linkage attacks still exists and is incalculable. This paper proposes a rapid prediction model for the overall re-identification risk based on the statistics of data sets (i.e., the number of individuals, number of attributes, distribution of attribute values, and attribute dependency). Our proposed model reveals the impact of statistics on the overall re-identification risk and adopts random sampling and semi-random sampling methods to predict the overall re-identification risk of data sets with and without strong dependency ordered attribute pairs. Experimental results show that for the data sets without strong dependency ordered attribute pairs, the random sampling method has a high prediction accuracy (the prediction error is less than 0.05). For the data sets with strong dependency ordered attribute pairs, the semi-random sampling method has a high prediction accuracy (the prediction error is less than 0.09). Exploiting our model, governments and individuals can quickly assess the privacy leakage risk of their data sets, given only the statistic of the data sets. Besides, this model can also evaluate the privacy risk of data collection schemes in advance according to historical statistics, and identify suspected services.

Highlights

With the wide popularity of smart terminals and development of wireless communication technology, mobile multimedia applications become the indispensable tool for daily life and work [1]–[3]
We propose R3A model, in which the overall re-identification risk (ORR) of target data set can be predicted by the average ORR of random data sets with the same statistic
We considered the confidence of frequent tuple (a, b) in target data set is b_a, the algorithm of semi-random sampling method is shown as Algorithm 2

Summary

Introduction

With the wide popularity of smart terminals and development of wireless communication technology, mobile multimedia applications become the indispensable tool for daily life and work [1]–[3]. Ubiquitous access, rich functions and good experience make mobile multimedia applications more and more popular. Mobile multimedia service providers, in order to increase user viscosity, improve user experience, or reserve data resources, collect user personal information while providing services. While enjoying the convenience of mobile multimedia services, users must take on the risk of privacy disclosure. Trajectories of users will expose sensitive information such as home address and workplace. Information collectors always claim that the purpose of collecting personal data is to provide better services to users, and personal information will be de-identified and properly preserved. Many incidents of service provider data breach, such as the Facebook data privacy scandal and the Equifax data breach, suggest that improper data sharing and ubiquitous hacking make data stored on servers highly vulnerable. The leaked data may not contain the user’s identity, user’s quasi-identifiers such as age, gender, and zip code in the anonymous data can be collected by many

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Rapid Re-Identification Risk Assessment for Anonymous Data Set in Mobile Multimedia Scene

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Efficiently searching target data traces in storage devices with region based random sector sampling approach
Nitesh K Bharadwaj ... Upasna Singh
Digital Investigation | VOL. 24
Nitesh K Bharadwaj, et. al.Nitesh K Bharadwaj ... Upasna Singh
01 Mar 2018
Digital Investigation | VOL. 24

A performance comparison of sampling methods in the assessment of species composition patterns and environment–vegetation relationships in species-rich grasslands
Grzegorz Swacha ... Daniel Pruchniewicz
Acta Societatis Botanicorum Poloniae | VOL. 86
Grzegorz Swacha, et. al.Grzegorz Swacha ... Daniel Pruchniewicz
01 Dec 2017
Acta Societatis Botanicorum Poloniae | VOL. 86

WristPrint: Characterizing User Re-identification Risks from Wrist-worn Accelerometry Data.
Nazir Saleheen ... Mani Srivastava
Conference on Computer and Communications Security : proceedings of the ... conference on computer and communications security. ACM Conference on Computer and Communications Security | VOL. 2021
Nazir Saleheen, et. al.Nazir Saleheen ... Mani Srivastava
12 Nov 2021
12 Nov 2021

The re-identification risk of Canadians from longitudinal demographics
Khaled El Emam ... Elizabeth Jonker
BMC Medical Informatics and Decision Making | VOL. 11
Khaled El Emam, et. al.Khaled El Emam ... Elizabeth Jonker
22 Jun 2011
BMC Medical Informatics and Decision Making | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Rapid Re-Identification Risk Assessment for Anonymous Data Set in Mobile Multimedia Scene

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access