Dynamic Privacy-Preserving Recommendations on Academic Graph Data

Erasmo Purificato,Ernesto William De Luca,Sabine Wehnert

doi:10.3390/computers10090107

Abstract

In the age of digital information, where the internet and social networks, as well as personalised systems, have become an integral part of everyone’s life, it is often challenging to be aware of the amount of data produced daily and, unfortunately, of the potential risks caused by the indiscriminate sharing of personal data. Recently, attention to privacy has grown thanks to the introduction of specific regulations such as the European GDPR. In some fields, including recommender systems, this has inevitably led to a decrease in the amount of usable data, and, occasionally, to significant degradation in performance mainly due to information no longer being attributable to specific individuals. In this article, we present a dynamic privacy-preserving approach for recommendations in an academic context. We aim to implement a personalised system capable of protecting personal data while at the same time allowing sensible and meaningful use of the available data. The proposed approach introduces several pseudonymisation procedures based on the design goals described by the European Union Agency for Cybersecurity in their guidelines, in order to dynamically transform entities (e.g., persons) and attributes (e.g., authored papers and research interests) in such a way that any user processing the data are not able to identify individuals. We present a case study using data from researchers of the Georg Eckert Institute for International Textbook Research (Brunswick, Germany). Building a knowledge graph and exploiting a Neo4j database for data management, we first generate several pseudoN-graphs, being graphs with different rates of pseudonymised persons. Then, we evaluate our approach by leveraging the graph embedding algorithm node2vec to produce recommendations through node relatedness. The recommendations provided by the graphs in different privacy-preserving scenarios are compared with those provided by the fully non-pseudonymised graph, considered as the baseline of our evaluation. The experimental results show that, despite the structural modifications to the knowledge graph structure due to the de-identification processes, applying the approach proposed in this article allows for preserving significant performance values in terms of precision.

Highlights

Our work aims to retain the performance in a recommender system while allowing complete personal data protection
We presented a dynamic approach for privacy-preserving recommendations aiming to retain the performance in an academic recommender system while allowing complete personal data protection, according to the GDPR dispositions
We proposed a de-identification approach based on several European guidelines and state-of-the-art works on pseudonymisation techniques in order to dynamically transform entities and attributes in such a way that any user working on or processing the data will not be able to identify individuals, but can utilise it in a meaningful manner

Summary

Introduction

Personalised systems, social network platforms and search engines can be considered among the most widespread technologies in the last two decades Their ubiquity and soaring popularity have led (and are due) to a massive amount of personal data, opinions, professional and individual interests shared by users in several contexts, from e-commerce to academic research. The advent and fast spread of recommender systems have contributed significantly to the growth of interests in retrieving relevant, personalised information in the scientific environment, mainly in terms of experts [1,2] and paper recommendations [3,4,5] It is self-evident to point out that, in the current era of big data and information overload, having such systems can help in navigating the mass of content being created on a daily basis, especially for academics, for whom not being aware of relevant related work, experts or research projects is a common problem

Objectives

Results

Discussion

Conclusion