Any Privacy Risk if Nobody’s Personal Information Being Collected?

Tingting Feng,Yuchun Guo,Xiaoyu Li,Shan Yang,Ti Wang,Zhongyan Du

doi:10.1007/978-981-15-2767-8_31

Abstract

To alleviate the growing concern about privacy breaches in online services, some systems do not ask users for any demographic information (DI), such as gender or age. Keeping user DI private in such systems seems guaranteed. However, as a trend report, some other systems may publish statistical preference corresponding to users with different DI, e.g. 80% of buyers of one product being young females. Intuitively, such statistical preference will not raise any privacy risk in the former type of systems, since specific personal behaviour or DI cannot be reconstructed from the statistical data. However, in this paper, we will reveal that this is not the case. We propose an unsupervised transfer learning scheme to learn multidimensional DI vectors for individual users and topics with external statistical preference. To facilitate unsupervised learning, we apply it to a rating recommendation task and concatenate DI vector with the implicit preference vector. To compare the privacy risk in scenarios with/without DI available, we choose to make the DI in the datasets of real rating systems to be concealed or not. Experiments show that our unsupervised learning scheme based on external statistical preference in the DI unavailable scenario performs almost as well as the corresponding supervised learning scheme in DI available scenario in the same system. This verifies the existence of privacy risks in systems that do not collect personal information because statistical preferences in similar systems are at fingertips.

Full Text