Abstract

BackgroundDe-identifying personal information is critical when using personal health data for secondary research. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy.ObjectiveThis study proposes and evaluates a de-identification strategy that is comprised of several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The proposed strategy was evaluated using the actual CDM database.MethodsThe CDM database used in this study was constructed by the Anam Hospital of Korea University. Analysis and evaluation were performed using the ARX anonymizing framework in combination with the k-anonymity, l-diversity, and t-closeness privacy models.ResultsThe CDM database, which was constructed according to the rules established by Observational Health Data Sciences and Informatics, exhibited a low risk of re-identification: The highest re-identifiable record rate (11.3%) in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one “highest risk” value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the “source values” (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models but also the overall possibility of re-identification.ConclusionsOur proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers.

Highlights

  • The Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics (OHDSI) [1], is a standard data schema [2,3] that uses standardized terms [4]

  • Owing to the importance of de-identifying personal information while using personal health data for secondary research, the OMOP-CDM already implements a certain level of de-identification during the construction of its database: The reference architecture provided by OHDSI, namely the OHDSIonAWS [18], uses several anonymization methods to comply with the Health Information Portability and Accountability Act (HIPAA) [7]

  • We propose an enhanced de-identification strategy that is comprised of a set of rules for privacy models such as k-anonymity, l-diversity, and t-closeness for the OMOP-CDM from the perspective of reconnection with other information and privacy models

Read more

Summary

Introduction

The Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics (OHDSI) [1], is a standard data schema [2,3] that uses standardized terms [4]. Owing to increasing system complexities and service availability, operators generally prefer to run the CDM database in a cloud-computing environment This recent trend has led to recent regulatory and legal considerations regarding network accessibility [7]. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.