Anticipating anonymity in screening program databases

Rafael Caballero,Sagar Sen,Jan F Nygård

doi:10.1016/j.ijmedinf.2017.04.003

Abstract

In this paper, we propose a technique for improving anonymity in screening program databases to increase the privacy for the participants in these programs. The data generated by the invitation process (screening centre, appointment date) is often made available to researchers for medical research and for evaluation and improvement of the screening program. This information, combined with other personal quasi-identifiers such as the ZIP code, gender or age, can pose a risk of disclosing the identity of the individuals participating in the program, and eventually their test results. We present two algorithms that produce a set of screening appointments that aim to increase anonymity of the resulting dataset. The first one, based on the constraint programming paradigm, defines the optimal appointments, while the second one is a suboptimal heuristic algorithm that can be used with real size datasets. The level of anonymity is measured using the new concept of generalized k-anonymity, which allows us to show the utility of the proposal by means of experiments, both using random data and data based on screening invitations from the Norwegian Cancer Registry.

Full Text