Abstract

Large-scale collection of human behavioural data by companies raises serious privacy concerns. We show that behaviour captured in the form of application usage data collected from smartphones is highly unique even in large datasets encompassing millions of individuals. This makes behaviour-based re-identification of users across datasets possible. We study 12 months of data from 3.5 million people from 33 countries and show that although four apps are enough to uniquely re-identify 91.2% of individuals using a simple strategy based on public information, there are considerable seasonal and cultural variations in re-identification rates. We find that people have more unique app-fingerprints during summer months making it easier to re-identify them. Further, we find significant variations in uniqueness across countries, and reveal that American users are the easiest to re-identify, while Finns have the least unique app-fingerprints. We show that differences across countries can largely be explained by two characteristics of the country specific app-ecosystems: the popularity distribution and the size of app-fingerprints. Our work highlights problems with current policies intended to protect user privacy and emphasizes that policies cannot directly be ported between countries. We anticipate this will nuance the discussion around re-identifiability in digital datasets and improve digital privacy.

Highlights

  • Large-scale collection of human behavioural data by companies raises serious privacy concerns

  • Tracking behaviour is a fundamental part of the big-data economy, allowing companies and organizations to segment, profile and understand their users in increasingly greater d­ etail[1]

  • With the standard methods based on cookies for identifying customers not being used in smartphones, combined with the rising usage of ad-blockers among ­users[19], companies, advertisers, and so-called data brokers are using smartphone apps to identify and track individuals

Read more

Summary

Introduction

Large-scale collection of human behavioural data by companies raises serious privacy concerns. We show that behaviour captured in the form of application usage data collected from smartphones is highly unique even in large datasets encompassing millions of individuals. This makes behaviourbased re-identification of users across datasets possible. Describes how data broker companies obtain vast amounts of personal data, which they further enrich with additional online and offline sources, and re-sell these improved datasets to the highest bidder, typically without the explicit consent or knowledge of u­ sers[24] An example of this is TalkingData, China’s largest third-party mobile data platform, which collects and sells app usage data of more than 750 million smartphone ­users[25]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.