Abstract

Ubiquitous mobile multimedia applications bring great convenience to users. However, when enjoying mobile multimedia services, users provide personal data to service platforms. Although the service platforms always claim that the collected personal data are de-identified, the risk of re-identifying users through linkage attacks still exists and is incalculable. This paper proposes a rapid prediction model for the overall re-identification risk based on the statistics of data sets (i.e., the number of individuals, number of attributes, distribution of attribute values, and attribute dependency). Our proposed model reveals the impact of statistics on the overall re-identification risk and adopts random sampling and semi-random sampling methods to predict the overall re-identification risk of data sets with and without strong dependency ordered attribute pairs. Experimental results show that for the data sets without strong dependency ordered attribute pairs, the random sampling method has a high prediction accuracy (the prediction error is less than 0.05). For the data sets with strong dependency ordered attribute pairs, the semi-random sampling method has a high prediction accuracy (the prediction error is less than 0.09). Exploiting our model, governments and individuals can quickly assess the privacy leakage risk of their data sets, given only the statistic of the data sets. Besides, this model can also evaluate the privacy risk of data collection schemes in advance according to historical statistics, and identify suspected services.

Highlights

  • With the wide popularity of smart terminals and development of wireless communication technology, mobile multimedia applications become the indispensable tool for daily life and work [1]–[3]

  • We propose R3A model, in which the overall re-identification risk (ORR) of target data set can be predicted by the average ORR of random data sets with the same statistic

  • We considered the confidence of frequent tuple (a, b) in target data set is b_a, the algorithm of semi-random sampling method is shown as Algorithm 2

Read more

Summary

Introduction

With the wide popularity of smart terminals and development of wireless communication technology, mobile multimedia applications become the indispensable tool for daily life and work [1]–[3]. Ubiquitous access, rich functions and good experience make mobile multimedia applications more and more popular. Mobile multimedia service providers, in order to increase user viscosity, improve user experience, or reserve data resources, collect user personal information while providing services. While enjoying the convenience of mobile multimedia services, users must take on the risk of privacy disclosure. Trajectories of users will expose sensitive information such as home address and workplace. Information collectors always claim that the purpose of collecting personal data is to provide better services to users, and personal information will be de-identified and properly preserved. Many incidents of service provider data breach, such as the Facebook data privacy scandal and the Equifax data breach, suggest that improper data sharing and ubiquitous hacking make data stored on servers highly vulnerable. The leaked data may not contain the user’s identity, user’s quasi-identifiers such as age, gender, and zip code in the anonymous data can be collected by many

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call