Abstract

In disciplines such as biomedicine and social sciences, sharing and combining sensitive individual-level data is often prohibited by ethical-legal or governance constraints and other barriers such as the control of intellectual property or the huge sample sizes. DataSHIELD ( D ata A ggregation T hrough A nonymous S ummary-statistics from H armonised I ndividual-lev EL D atabases) is a distributed approach that allows the analysis of sensitive individual-level data from one study, and the co-analysis of such data from several studies simultaneously without physically pooling them or disclosing any data. Following initial proof of principle, a stable DataSHIELD platform has now been implemented in a number of epidemiological consortia. This paper reports three new applications of DataSHIELD including application to post-publication sensitive data analysis, text data analysis and privacy protected data visualisation. Expansion of DataSHIELD analytic functionality and application to additional data types demonstrate the broad applications of the software beyond biomedical sciences.

Highlights

  • Data access and analysis barriers within biomedical and social sciences research can arise for a variety of reasons including: i) ethical-legal restrictions surrounding confidentiality and the sharing of, or access to, disclosive data; ii) intellectual property or licensing issues surrounding research access to raw data; iii) the physical size of the data.There are three processes by which individual level data in biomedical research is typically shared or accessed (Table 1)

  • We have shown that DataSHIELD uniquely provides a mechanism for theanalysis of sensitive data by building in statistical disclosure controls and security measures to meet the requirements of data owners

  • DataSHIELD does not require the setup of substantial infrastructure that is necessary of a closed repository or data safe haven

Read more

Summary

Introduction

Data access and analysis barriers within biomedical and social sciences research can arise for a variety of reasons including: i) ethical-legal restrictions surrounding confidentiality and the sharing of, or access to, disclosive data; ii) intellectual property or licensing issues surrounding research access to raw data; iii) the physical size of the data.There are three processes by which individual level data (microdata) in biomedical research is typically shared or accessed (Table 1). (encrypted) hard drives; email; direct download; secure ftp; or utilising cloud sharing and storage systems e.g. Google Drive. These release methods may not satisfy privacy, ethical and legal restrictions nor data security concerns associated with these data. In such examples, these risks are mitigated by applying statistical disclosure limitation (Karr and Reiter, 2014, Shlomo et al 2015) or anonymisation/pseudonymisation methods (Sweeney, 2002; Elliot et al 2016) to the data prior to repository release.

Methods
Findings
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.