Abstract

SummaryAlthough anonymous data are not considered personal data, recent research has shown how individuals can often be re-identified. Scholars have argued that previous findings apply only to small-scale datasets and that privacy is preserved in large-scale datasets. Using 3 months of location data, we (1) show the risk of re-identification to decrease slowly with dataset size, (2) approximate this decrease with a simple model taking into account three population-wide marginal distributions, and (3) prove that unicity is convex and obtain a linear lower bound. Our estimates show that 93% of people would be uniquely identified in a dataset of 60M people using four points of auxiliary information, with a lower bound at 22%. This lower bound increases to 87% when five points are available. Taken together, our results show how the privacy of individuals is very unlikely to be preserved even in country-scale location datasets.

Highlights

  • Throughout our day, we interact with many digital services when using our phone, paying with our credit card, or using public transport with a smart card

  • Unicity has since been used to quantify re-identification risk across a number of domains, including the mobility of vehicles,[33] apps downloaded by smartphones over time,[34,35] smart cards used in public transport,[24] credit card transaction histories,[36] and location data from mobile phones in a number of countries.[32,37,38]

  • To further study how unicity decreases with dataset size and whether it decreases sufficiently in population-scale datasets, we propose a simple statistical model taking into account three population-wide marginal distributions—circadian ðPCÞ, frequency ðPF Þ, and activity ðPAÞ—along with the network of mobile phone antennas in a country

Read more

Summary

Introduction

Throughout our day, we interact with many digital services when using our phone, paying with our credit card, or using public transport with a smart card. This results in our location data being collected broadly, sometimes on the scale of countries. Vodafone UK collects location trajectories of 20M citizens1—a third of the population—while up to 5 million people use London’s subway daily.[2]. Location data have been used extensively in research. Mobility data can be used to monitor urban activity[3] and help design better cities.[4] In epidemiology, it has been used to monitor and mitigate the spread of infectious diseases such as

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call