De-identifying Data Research Articles

The open-science movement seeks to make research more transparent and accessible. To that end, researchers are increasingly expected to share de-identified data with other scholars for review, reanalysis, and reuse. In psychology, open-science practices have been explored primarily within the context of quantitative data, but demands to share qualitative data are becoming more prevalent. Narrative data are far more challenging to de-identify fully, and because qualitative methods are often used in studies with marginalized, minoritized, and/or traumatized populations, data sharing may pose substantial risks for participants if their information can be later reidentified. To date, there has been little guidance in the literature on how to de-identify qualitative data. To address this gap, we developed a methodological framework for remediating sensitive narrative data. This multiphase process is modeled on common qualitative-coding strategies. The first phase includes consultations with diverse stakeholders and sources to understand reidentifiability risks and data-sharing concerns. The second phase outlines an iterative process for recognizing potentially identifiable information and constructing individualized remediation strategies through group review and consensus. The third phase includes multiple strategies for assessing the validity of the de-identification analyses (i.e., whether the remediated transcripts adequately protect participants’ privacy). We applied this framework to a set of 32 qualitative interviews with sexual-assault survivors. We provide case examples of how blurring and redaction techniques can be used to protect names, dates, locations, trauma histories, help-seeking experiences, and other information about dyadic interactions.

BackgroundThe secondary use of health data is central to biomedical research in the era of data science and precision medicine. National and international initiatives, such as the Global Open Findable, Accessible, Interoperable, and Reusable (GO FAIR) initiative, are supporting this approach in different ways (eg, making the sharing of research data mandatory or improving the legal and ethical frameworks). Preserving patients’ privacy is crucial in this context. De-identification and anonymization are the two most common terms used to refer to the technical approaches that protect privacy and facilitate the secondary use of health data. However, it is difficult to find a consensus on the definitions of the concepts or on the reliability of the techniques used to apply them. A comprehensive review is needed to better understand the domain, its capabilities, its challenges, and the ratio of risk between the data subjects’ privacy on one side, and the benefit of scientific advances on the other.ObjectiveThis work aims at better understanding how the research community comprehends and defines the concepts of de-identification and anonymization. A rich overview should also provide insights into the use and reliability of the methods. Six aspects will be studied: (1) terminology and definitions, (2) backgrounds and places of work of the researchers, (3) reasons for anonymizing or de-identifying health data, (4) limitations of the techniques, (5) legal and ethical aspects, and (6) recommendations of the researchers.MethodsBased on a scoping review protocol designed a priori, MEDLINE was searched for publications discussing de-identification or anonymization and published between 2007 and 2017. The search was restricted to MEDLINE to focus on the life sciences community. The screening process was performed by two reviewers independently.ResultsAfter searching 7972 records that matched at least one search term, 135 publications were screened and 60 full-text articles were included. (1) Terminology: Definitions of the terms de-identification and anonymization were provided in less than half of the articles (29/60, 48%). When both terms were used (41/60, 68%), their meanings divided the authors into two equal groups (19/60, 32%, each) with opposed views. The remaining articles (3/60, 5%) were equivocal. (2) Backgrounds and locations: Research groups were based predominantly in North America (31/60, 52%) and in the European Union (22/60, 37%). The authors came from 19 different domains; computer science (91/248, 36.7%), biomedical informatics (47/248, 19.0%), and medicine (38/248, 15.3%) were the most prevalent ones. (3) Purpose: The main reason declared for applying these techniques is to facilitate biomedical research. (4) Limitations: Progress is made on specific techniques but, overall, limitations remain numerous. (5) Legal and ethical aspects: Differences exist between nations in the definitions, approaches, and legal practices. (6) Recommendations: The combination of organizational, legal, ethical, and technical approaches is necessary to protect health data.ConclusionsInterest is growing for privacy-enhancing techniques in the life sciences community. This interest crosses scientific boundaries, involving primarily computer science, biomedical informatics, and medicine. The variability observed in the use of the terms de-identification and anonymization emphasizes the need for clearer definitions as well as for better education and dissemination of information on the subject. The same observation applies to the methods. Several legislations, such as the American Health Insurance Portability and Accountability Act (HIPAA) and the European General Data Protection Regulation (GDPR), regulate the domain. Using the definitions they provide could help address the variable use of these two concepts in the research community.

De-identifying Data Research Articles

Related Topics

Articles published on De-identifying Data

A novel encryption protocol for facilitating de-identification of genomics health data

Exploring Freely Available Data Tools to Support Open Data and Open Science

Open-Science Guidance for Qualitative Research: An Empirically Validated Approach for De-Identifying Sensitive Narrative Data

A framework for de-identification of free-text data in electronic medical records enabling secondary use.

Research Goal-Driven Data Model and Harmonization for De-Identifying Patient Data in Radiomics.

Attacker models with a variety of background knowledge to de-identified data

Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review.

Dating and Sexual Violence Research in the Schools: Balancing Protection of Confidentiality with Supporting the Welfare of Survivors.

Clinical trial transparency: many gains but access to evidence for new medicines remains imperfect.

An approach for de-identification of point locations of livestock premises for further use in disease spread modeling

The project data sphere initiative: accelerating cancer research by sharing data.

Building public trust in uses of Health Insurance Portability and Accountability Act de-identified data

SIG 13 Perspectives Vol. 20, No. 2, June 2011

An integrated framework for de-identifying unstructured medical data

Needs Assessment for Functionalities in Electronic Health Record Systems in General Hospitals

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

De-identifying Data Research Articles

Related Topics

Articles published on De-identifying Data

A novel encryption protocol for facilitating de-identification of genomics health data

Exploring Freely Available Data Tools to Support Open Data and Open Science

Open-Science Guidance for Qualitative Research: An Empirically Validated Approach for De-Identifying Sensitive Narrative Data

A framework for de-identification of free-text data in electronic medical records enabling secondary use.

Research Goal-Driven Data Model and Harmonization for De-Identifying Patient Data in Radiomics.

Attacker models with a variety of background knowledge to de-identified data

Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review.

Dating and Sexual Violence Research in the Schools: Balancing Protection of Confidentiality with Supporting the Welfare of Survivors.

Clinical trial transparency: many gains but access to evidence for new medicines remains imperfect.

An approach for de-identification of point locations of livestock premises for further use in disease spread modeling

The project data sphere initiative: accelerating cancer research by sharing data.

Building public trust in uses of Health Insurance Portability and Accountability Act de-identified data

SIG 13 Perspectives Vol. 20, No. 2, June 2011

An integrated framework for de-identifying unstructured medical data

Needs Assessment for Functionalities in Electronic Health Record Systems in General Hospitals