Abstract

In the Internet era, users’ fundamental privacy and anonymity rights have received significant research and regulatory attention. This is not only a result of the exponential growth of data that users generate when accomplishing their daily task by means of computing devices with advanced capabilities, but also because of inherent data properties that allow them to be linked with a real or soft identity. Service providers exploit these facts for user monitoring and identification, albeit impacting users’ anonymity, based mainly on personal identifiable information or on sensors that generate unique data to provide personalized services. In this paper, we report on the feasibility of user identification using general system features like memory, CPU and network data, as provided by the underlying operating system. We provide a general framework based on supervised machine learning algorithms both for distinguishing users and informing them about their anonymity exposure. We conduct a series of experiments to collect trial datasets for users’ engagement on a shared computing platform. We evaluate various well-known classifiers in terms of their effectiveness in distinguishing users, and we perform a sensitivity analysis of their configuration setup to discover optimal settings under diverse conditions. Furthermore, we examine the bounds of sampling data to eliminate the chances of user identification and thus promote anonymity. Overall results show that under certain configurations users’ anonymity can be preserved, while in other cases users’ identification can be inferred with high accuracy, without relying on personal identifiable information.

Highlights

  • The proliferation of online applications and services available to the end users combined with their continuous, high-frequency usage have lead to an explosion in the amount of data generated by the users that are subject to monitoring and recording

  • Orthogonally if the appropriate parameters are taking into account, users anonymity could be preserved as well

  • Whereas such data have been utilized for system management, to the best of our knowledge there is no previous work that has studied the effectiveness of user identification by means of such data or alternatively whether users could be monitored

Read more

Summary

Introduction

The proliferation of online applications and services available to the end users combined with their continuous, high-frequency usage have lead to an explosion in the amount of data generated by the users that are subject to monitoring and recording. This ever- increasing amount of highly granular data that refer to personal and private user activities poses a great risk to user privacy since it can be exploited to potentially reveal their activities and threaten their anonymity. GPS sensors generate unique data that can be exploited for user identification [8]. Other approaches operate on users’ microdata to perform user identification [11]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call