Abstract

BackgroundMulti-site health sciences research is becoming more common, as it enables investigation of rare outcomes and diseases and new healthcare innovations. Multi-site research usually involves the transfer of large amounts of research data between collaborators, which increases the potential for accidental disclosures of protected health information (PHI). Standard protocols for preventing release of PHI are extremely vulnerable to human error, particularly when the shared data sets are large.MethodsTo address this problem, we developed an automated program (SAS macro) to identify possible PHI in research data before it is transferred between research sites. The macro reviews all data in a designated directory to identify suspicious variable names and data patterns. The macro looks for variables that may contain personal identifiers such as medical record numbers and social security numbers. In addition, the macro identifies dates and numbers that may identify people who belong to small groups, who may be identifiable even in the absences of traditional identifiers.ResultsEvaluation of the macro on 100 sample research data sets indicated a recall of 0.98 and precision of 0.81.ConclusionsWhen implemented consistently, the macro has the potential to streamline the PHI review process and significantly reduce accidental PHI disclosures.

Highlights

  • Multi-site health sciences research is becoming more common, as it enables investigation of rare outcomes and diseases and new healthcare innovations

  • The transfer data sets can range from aggregate counts to patient-level data about encounters, diagnoses and procedures, prescriptions, and lab test results depending on the research needs, the Data use agreement (DUA) and the Institutional Review Board (IRB) agreement

  • The file listing includes a record count for all SAS data sets, as well as the date each data set was created and modified. Both the file count and the file listing can be compared to the expected output described in the program’s workplan to evaluate whether the program has produced the correct data sets and to ensure there are no unexpected files in the transfer directory

Read more

Summary

Introduction

Multi-site health sciences research is becoming more common, as it enables investigation of rare outcomes and diseases and new healthcare innovations. Multi-site research usually involves the transfer of large amounts of research data between collaborators, which increases the potential for accidental disclosures of protected health information (PHI). Datalink [3], and the Centers for Education and Research on Therapeutics [4], the FDA Sentinel project [5] and the Scalable PArtnering Network (SPAN) [6], among others. These collaborations often require the release of aggregated patient data or fully or partially de-identified patient-level information from participating institutions to the lead research site. As public health research collaborations grow more common, the potential for accidental disclosure of PHI grows

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call