Developing data governance standards for using free-text data in research (TexGov)

Kerina Jones,Emma Squires,Sharon Heys,Elizabeth Ford,Lucy Griffiths,Nathan Lea

doi:10.23889/ijpds.v4i3.1332

Abstract

BackgroundFree-text data represent a vast, untapped source of rich information to guide research and public service delivery. Free-text data contain a wealth of additional detail that, if more accessible, would clarify and supplement information coded in structured data fields. Personal data usually need to be de-identified or anonymised before they can be used for purposes such as audit and research, but there are major challenges in finding effective methods to de-identify free-text that do not damage data utility as a by-product. The main aim of the TexGov project is to work towards data governance standards to enable free-text data to be used safely for public benefit. MethodsWe conducted: a rapid literature review to explore the data governance models used in working with free-text data, plus case studies of systems making de-identified free-text data available for research; we engaged with text mining researchers and the general public to explore barriers and solutions in working with free-text; and we outlined (UK) data protection legislation and regulations for context. ResultsWe reviewed 50 articles and the models of 4 systems providing access to de-identified free-text. The main emerging themes were: i) patient involvement at identifiable and de-identified data stages; ii) questions of consent and notification for the reuse of free-text data; iii) working with identifiable data for Natural Language Processing algorithm development; and iv) de-identification methods and thresholds of reliability. ConclusionWe have proposed a set of recommendations, including: ensuring public transparency in data flows and uses; adhering to the principles of minimal data extraction; treating de-identified blacklisted free-text as potentially identifiable with use limited to accredited data safe-havens; and, the need to commit to a culture of continuous improvement to understand the relationships between accuracy of de-identification and re-identification risk, so this can be communicated to all stakeholders.

Highlights

Free-text data represent a vast, untapped source of rich information to guide research and public service delivery
Free-text data contain a wealth of additional detail that, if more accessible, would clarify and supplement information coded in structured data fields
Personal data usually need to be deidentified or anonymised before they can be used for purposes such as audit and research, but there are major challenges in finding effective methods to de-identify free-text that do not damage data utility as a by-product

Summary

Background

Free-text data represent a vast, untapped source of rich information to guide research and public service delivery. Free-text data contain a wealth of additional detail that, if more accessible, would clarify and supplement information coded in structured data fields. Personal data usually need to be deidentified or anonymised before they can be used for purposes such as audit and research, but there are major challenges in finding effective methods to de-identify free-text that do not damage data utility as a by-product. The main aim of the TexGov project is to work towards data governance standards to enable free-text data to be used safely for public benefit

Conclusions

Methods

Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Population Data Science	Publication Date: Nov 26, 2019
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Developing data governance standards for using free-text data in research (TexGov)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Population Data Science

Lead the way for us

Similar Papers

Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper.
Kerina H Jones ... Nathan Lea
Journal of Medical Internet Research | VOL. 22
Kerina H Jones, et. al.Kerina H Jones ... Nathan Lea
29 Jun 2020
Journal of Medical Internet Research | VOL. 22

De-identification of free text data containing personal health information: a scoping review of reviews.
Bekelu Negash ... Moniruzzaman Moni
International journal of population data science | VOL. 8
Bekelu Negash, et. al.Bekelu Negash ... Moniruzzaman Moni
12 Dec 2023
International journal of population data science | VOL. 8

A framework for de-identification of free-text data in electronic medical records enabling secondary use.
Louis Mercorelli ... Jonathan Morris
Australian Health Review | VOL. 46
Louis Mercorelli, et. al.Louis Mercorelli ... Jonathan Morris
01 Jan 2021
Australian Health Review | VOL. 46

Data protection legislation in Africa and pathways for enhancing compliance in big data health research
Nchangwi Syntia Munung ... Ambroise Wonkam
Health Research Policy and Systems | VOL. 22
Nchangwi Syntia Munung, et. al.Nchangwi Syntia Munung ... Ambroise Wonkam
15 Oct 2024
Health Research Policy and Systems | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Developing data governance standards for using free-text data in research (TexGov)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Population Data Science