Can censoring of research outputs be automated to ensure robust data protection?

Michael Nicholas,Luke Player,Kerry Bailey,Kelly Nock,Helen Thomas,Craig Barker,Chris Davies

doi:10.23889/ijpds.v1i1.282

Abstract

ABSTRACT BackgroundGuidance regarding research outputs recommends censoring so that even when aggregating anonymised linked data no cell should have less than 5 -10 units. This is recommended to decrease the likelihood of re- identification. Leaving those cells empty is not adequate if other cells can be used to identify the numerical value of that cell. Some outputs necessitate a large number of tables to be exported this will become more common. This was the case where the outputs from a research study involved several large tables which drove a front end interactive visualisation. As linked data outputs are used to make operational decisions which necessitates timely data outputs of large amount of aggregated data this issue will be more common. Human scanning of all tables may not be time or cost effective and can be subject to human error. ApproachMany methods of censoring were considered including Barnardisation (adding or subtracting 1 randomly to small numbers) suppression and a combination of methods. It was then necessary to code the methods to ensure that censoring was implemented in all cells in the output and that the output was still meaningful. It was then necessary to check the outputs for quality and introduce an ‘audit’ system to ensure that the quality was maintained but did not impact on the outputs of the findings. DiscussionSoftware engineers were able to develop an algorithm that performed safe censoring using a level of ‘10 or under’. It also ensured that the statistical tables were still functional. The presentation will describe how this was done and demonstrate some examples of the impact on the output. Some stakeholders felt that the censoring of the anonymised aggregated data went beyond the ‘reasonable effort’ required to re- identify individuals. Some expressed the opinion that the lack of detail and missing data that this method results in is excessive and has been sacrificed for the sake of minimal risk. Some stakeholders felt the risks had been allowed to outweigh the societal benefits. The team were assured that although the censoring may be considered excessive by some it did ensure safe censoring and offered as low a risk a possible for re-identification. However routine implementation of this method has not been agreed.

Highlights

Guidance regarding research outputs recommends censoring so that even when aggregating anonymised linked data no cell should have less than 5 -10 units. This is recommended to decrease the likelihood of re- identification. Leaving those cells empty is not adequate if other cells can be used to identify the numerical value of that cell
Some outputs necessitate a large number of tables to be exported this will become more common
Some stakeholders felt that the censoring of the anonymised aggregated data went beyond the ‘reasonable effort’ required to re- identify individuals

Summary

Introduction

Guidance regarding research outputs recommends censoring so that even when aggregating anonymised linked data no cell should have less than 5 -10 units. Can censoring of research outputs be automated to ensure robust data protection? Michael1, Davies, Chris1, Nock, Kelly1*, Bailey, Kerry1, Barker, Craig2, Player, Luke3, and Thomas, Helen2 This is recommended to decrease the likelihood of re- identification.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Can censoring of research outputs be automated to ensure robust data protection?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Population Data Science

Lead the way for us

Journal: International Journal of Population Data Science	Publication Date: Apr 18, 2017
License type: CC BY-NC-ND 4.0

Similar Papers

DATA ENVELOPMENT ANALYSIS WITH MISSING DATA: AN EXPECTATION MAXIMIZATION APPROACH
Talat Senel ... Yuksel Terzi
PONTE International Scientific Researchs Journal | VOL. 72
Talat Senel, et. al.Talat Senel ... Yuksel Terzi
01 Jan 2015
PONTE International Scientific Researchs Journal | VOL. 72

Synthetization of bicycle route data from aggregate GPS-based cycling data and its utility for bicycle route choice analysis
Stefan Huber
-
Stefan HuberStefan Huber
16 Jun 2021
16 Jun 2021

Disaggregating Racial and Ethnic Data: A Step Toward Diversity, Equity, and Inclusion
Peter S Liang ... Simona C Kwon
Gastroenterology | VOL. 164
Peter S Liang, et. al.Peter S Liang ... Simona C Kwon
21 Feb 2023
Gastroenterology | VOL. 164

Design of a Database for Documentation and Analysis of Lab Data and Rock-Properties Derived in Field Measurements
Florian Menschik ... Michael Bayerl
-
Florian Menschik, et. al.Florian Menschik ... Michael Bayerl
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Can censoring of research outputs be automated to ensure robust data protection?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Population Data Science