Abstract
Medical reports, governed by HIPAA regulations, contain personal health information (PHI), restricting secondary data use. Utilizing natural language processing (NLP) and large language models (LLM), we sought to employ publicly available methods to automatically anonymize PHI in free-text radiology reports. We compared two publicly available rule-based NLP models (spaCy; NLPac, accuracy-optimized; NLPsp, speed-optimized; iteratively improved on 400 free-text CT-reports (test set)) and one offline LLM approach (LLM-model, LLaMa-2, Meta-AI) for PHI-anonymization. The three models were tested on 100 randomly selected chest CT reports. Two investigators assessed the anonymization of occurring PHI entities and whether clinical information was removed. Subsequently, precision, recall, and F1 scores were calculated. NLPac and NLPsp successfully removed all instances of dates (n = 333), medical record numbers (MRN) (n = 6), and accession numbers (ACC) (n = 92). The LLM model removed all MRNs, 96% of ACCs, and 32% of dates. NLPac was most consistent with a perfect F1-score of 1.00, followed by NLPsp with lower precision (0.86) and F1-score (0.92) for dates. The LLM model had perfect precision for MRNs, ACCs, and dates but the lowest recall for ACC (0.96) and dates (0.52), corresponding F1 scores of 0.98 and 0.68, respectively. Names were removed completely or majorly (i.e., one first or family name non-anonymized) in 100% (NLPac), 72% (NLPsp), and 90% (LLM-model). Importantly, NLPac and NLPsp did not remove medical information, while the LLM model did in 10% (n = 10). Pre-trained NLP models can effectively anonymize free-text radiology reports, while anonymization with the LLM model is more prone to deleting medical information. Question This study compares NLP and locally hosted LLM techniques to ensure PHI anonymization without losing clinical information. Findings Pre-trained NLP models effectively anonymized radiology reports without removing clinical data, while a locally hosted LLM was less reliable, risking the loss of important information. Clinical relevance Fast, reliable, automated anonymization of PHI from radiology reports enables HIPAA-compliant secondary use, facilitating advanced applications like LLM-driven radiology analysis while ensuring ethical handling of sensitive patient data.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.