Utility of Features in a Natural-Language-Processing-Based Clinical De-Identification Model Using Radiology Reports for Advanced NSCLC Patients

Tanmoy Paul,Michael Barnes,Preethi Aishwarya Tautam,Vasanthi Mandhadi,Abu Saleh Mohammad Mosa,Md Kamruz Zaman Rana,Humayera Islam,Yaswitha Jampani,Teja Venkat Pavan Kotapati,Richard D Hammer,Vishakha Sharma,Nitesh Singh

doi:10.3390/app12199976

Abstract

The de-identification of clinical reports is essential to protect the confidentiality of patients. The natural-language-processing-based named entity recognition (NER) model is a widely used technique of automatic clinical de-identification. The performance of such a machine learning model relies largely on the proper selection of features. The objective of this study was to investigate the utility of various features in a conditional-random-field (CRF)-based NER model. Natural language processing (NLP) toolkits were used to annotate the protected health information (PHI) from a total of 10,239 radiology reports that were divided into seven types. Multiple features were extracted by the toolkit and the NER models were built using these features and their combinations. A total of 10 features were extracted and the performance of the models was evaluated based on their precision, recall, and F1-score. The best-performing features were n-gram, prefix-suffix, word embedding, and word shape. These features outperformed others across all types of reports. The dataset we used was large in volume and divided into multiple types of reports. Such a diverse dataset made sure that the results were not subject to a small number of structured texts from where a machine learning model can easily learn the features. The manual de-identification of large-scale clinical reports is impractical. This study helps to identify the best-performing features for building an NER model for automatic de-identification from a wide array of features mentioned in the literature.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Oct 4, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Utility of Features in a Natural-Language-Processing-Based Clinical De-Identification Model Using Radiology Reports for Advanced NSCLC Patients

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Investigation of the Utility of Features in a Clinical De-identification Model: A Demonstration Using EHR Pathology Reports for Advanced NSCLC Patients.
Tanmoy Paul ... Nitesh Singh
Frontiers in digital health | VOL. 4
Tanmoy Paul, et. al.Tanmoy Paul ... Nitesh Singh
16 Feb 2022
Frontiers in digital health | VOL. 4

InaNLP: Indonesia natural language processing toolkit, case study: Complaint tweet classification
Ayu Purwarianti ... Irfan Afif
-
Ayu Purwarianti, et. al.Ayu Purwarianti ... Irfan Afif
01 Aug 2016
01 Aug 2016

Modified BERT-based end-to-end Chinese named entity recognition model
Yanchun Tan ... Youmin Zhu
-
Yanchun Tan, et. al.Yanchun Tan ... Youmin Zhu
13 Oct 2022
13 Oct 2022

DeIDNER Model: A Neural Network Named Entity Recognition Model for Use in the De-identification of Clinical Notes.
Mahanazuddin Syed ... Shorabuddin Syed
Biomedical engineering systems and technologies, international joint conference, BIOSTEC ... revised selected papers. BIOSTEC (Conference) | VOL. 5
Mahanazuddin Syed, et. al.Mahanazuddin Syed ... Shorabuddin Syed
01 Jan 2021
Biomedical engineering systems and technologies, international joint conference, BIOSTEC ... revised selected papers. BIOSTEC (Conference) | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Utility of Features in a Natural-Language-Processing-Based Clinical De-Identification Model Using Radiology Reports for Advanced NSCLC Patients

Abstract

Talk to us

Similar Papers

More From: Applied Sciences