Abstract

Faced with the emergence of the Covid-19 pandemic, and to better understand and contain the disease’s spread, health organisations increased the collaboration with other organisations sharing health data with data scientists and researchers. Data analysis assists such organisations in providing information that could help in decision-making processes. For this purpose, both national and regional health authorities provided health data for further processing and analysis. Shared data must comply with existing data protection and privacy regulations. Therefore, a robust de-identification procedure must be used, and a re-identification risk analysis should also be performed. De-identified data embodies state-of-the-art approaches in Data Protection by Design and Default because it requires the protection of direct and indirect identifiers (not just direct). This article highlights the importance of assessing re-identification risk before data disclosure by analysing a data set of individuals infected by Covid-19 that was made available for research purposes. We stress that it is highly important to make this data available for research purposes and that this process should be based on the state of the art methods in Data Protection by Design and by Default. Our main goal is to consider different re-identification risk analysis scenarios since the information on the intruder side is unknown. Our conclusions show that there is a risk of identity disclosure for all of the studied scenarios. For one, in particular, we proceed to an example of a re-identification attack. The outcome of such an attack reveals that it is possible to identify individuals with no much effort.

Highlights

  • On the last day of December 2019, the first reports of the Coronavirus disease, known as Covid-19 [1], emerged in China

  • We start by presenting the re-identification risk results throughout the provided data set, considering two attack scenarios

  • Fundamental privacy rights in a pandemic state deaths considering a realistic scenario and how many cases we can re-identify by linking to the external data set

Read more

Summary

Introduction

On the last day of December 2019, the first reports of the Coronavirus disease, known as Covid-19 [1], emerged in China. This virus is highly contagious and is mostly transmitted in humans by aerosols [2]. The Covid-19 outbreak quickly spread across the world, leading to an increase of people infected by the virus, and deaths For this reason, the World Health Organization declared the Covid-19 a pandemic [3]. To cooperate in combating the Coronavirus, several scientists from different fields contribute daily with their skills and expertise. One of the contributions to contain the disease is data analysis, which helps gain a more comprehensive understanding of it

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call