Abstract

BackgroundRare disease patients often struggle to find both medical advice and emotional support for their diagnosis. Consequently, many rare disease patient support forums have appeared on hospital webpages, social media sites, and on rare disease foundation sites. However, we argue that engagement in these groups may pose a healthcare data privacy threat to many participants, since it makes a series of patient indirect identifiers ‘readily available’ in combination with rare disease conditions. This information produces a risk of re-identification because it may allow a motivated attacker to use the unique combination of a patient’s identifiers and disease condition to re-identify them in anonymized data.ResultsTo assess this risk of re-identification, patient direct and indirect identifiers were mined from patient support forums for 80 patients across eight rare diseases. This data mining consisted of scanning patient testimonials, social media sites, and public records for the collection of identifiers linked to a rare disease patient. The number of people in the United States that may share each patient’s combination of marital status, 3-digit ZIP code, age, and sex, as well as their rare disease condition, was then estimated, as such information is commonly found in health records which have undergone de-identification by HIPAA’s ‘Safe Harbor.’ The study showed that by these estimations, nearly 75% of patients could be at high risk for re-identification in healthcare datasets in which they appear, due to their unique combination of identifiers.ConclusionsThe results of this study show that these rare disease patients, due to their choice to provide support for their community, are putting all their healthcare data at risk of re-identification. This paper demonstrates how simple adjustments to participation guidelines in such support forums, in combination with improved privacy measures at the organizational level, could mitigate this risk of re-identification. Additionally, this paper suggests the potential for future investigation into consideration of certain ‘risky’ International Classification of Diseases (ICD) codes as quasi-identifiers in de-identified datasets to further protect patients’ privacy, while maintaining the utility of such rare disease support groups.

Highlights

  • Rare disease patients often struggle to find both medical advice and emotional support for their diagnosis

  • A group size of less than one denotes that there are more identifiers present than is necessary to uniquely identify a single individual. This shows that are most patients at high-risk, but that 57.5% of patients were in a group size of less than one and are likely to be the only person with their group of indirect identifiers in the entire US population. This number may be slightly artificially inflated as the study did include some dead patients, but it is important to note that dead patients are inherently more discoverable as a result of the increased ability to collect data from obituaries and the increased redactions that usually accompany date of death in de-identified Health Insurance Portability and Accountability Act (HIPAA)-compliant datasets

  • This study revealed that many more patient stories could be collected without excessive effort for some rare disease conditions like cystic fibrosis (CF), acute lymphocytic leukemia (ALL), Huntington’s chorea, and male breast cancer

Read more

Summary

Introduction

Rare disease patients often struggle to find both medical advice and emotional support for their diagnosis. Support groups and websites have been created for many of the over 7,000 ‘Rare and Orphan Diseases’ recognized by the United States (which defines Rare and Orphan Diseases as those with fewer than 200,000 cases in the United States in the Orphan Drug Act of 1983) [7] As these groups continue to grow, patients post an increasing number of diagnosis/treatment stories in these open access forums to discuss treatment options and to provide hope and emotional support for other patients, all with the expectation that these “institutions will and should recognize their right for their privacy” [2, 8]. This uniqueness increases the chance of success for a motivated attacker attempting to re-identify records via prosecutor, marketer, or journalist attack and, increases the risk of re-identification [9]

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.