Abstract

BackgroundElectronic medical records, including pathology reports, are often used for research purposes. Currently, there are few programs freely available to remove identifiers while leaving the remainder of the pathology report text intact. Our goal was to produce an open source, Health Insurance Portability and Accountability Act (HIPAA) compliant, deidentification tool tailored for pathology reports. We designed a three-step process for removing potential identifiers. The first step is to look for identifiers known to be associated with the patient, such as name, medical record number, pathology accession number, etc. Next, a series of pattern matches look for predictable patterns likely to represent identifying data; such as dates, accession numbers and addresses as well as patient, institution and physician names. Finally, individual words are compared with a database of proper names and geographic locations. Pathology reports from three institutions were used to design and test the algorithms. The software was improved iteratively on training sets until it exhibited good performance. 1800 new pathology reports were then processed. Each report was reviewed manually before and after deidentification to catalog all identifiers and note those that were not removed.Results1254 (69.7 %) of 1800 pathology reports contained identifiers in the body of the report. 3439 (98.3%) of 3499 unique identifiers in the test set were removed. Only 19 HIPAA-specified identifiers (mainly consult accession numbers and misspelled names) were missed. Of 41 non-HIPAA identifiers missed, the majority were partial institutional addresses and ages. Outside consultation case reports typically contain numerous identifiers and were the most challenging to deidentify comprehensively. There was variation in performance among reports from the three institutions, highlighting the need for site-specific customization, which is easily accomplished with our tool.ConclusionWe have demonstrated that it is possible to create an open-source deidentification program which performs well on free-text pathology reports.

Highlights

  • Electronic medical records, including pathology reports, are often used for research purposes

  • The Health Insurance Portability and Accountability Act [2] (HIPAA) specifies that a de-identified data set can be created by removal of nineteen specific types of identifiers constitutes deidentification of the medical records

  • Since the ultimate goal of this network is to provide researchers throughout the country access to tissue specimens, it is absolutely necessary to deidentify the contents of the surgical pathology reports that form the core of the information that is contained within the network

Read more

Summary

Introduction

Electronic medical records, including pathology reports, are often used for research purposes. Investigators wishing to use medical records for research purposes have three options: obtain permission from the patients, obtain a waiver of informed consent from their Institutional Review Board or use a data set that has had all (de-identified data set) or most (limited data set) of the identifiers removed [1,2]. The Health Insurance Portability and Accountability Act [2] (HIPAA) specifies that a de-identified data set can be created by removal of nineteen specific types of identifiers constitutes deidentification of the medical records (see Table 1). These identifiers include names, ages, dates, addresses, and identifying codes of patients, their relatives, household members and employers. Since the ultimate goal of this network is to provide researchers throughout the country access to tissue specimens, it is absolutely necessary to deidentify the contents of the surgical pathology reports that form the core of the information that is contained within the network

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.