NER Models Research Articles

590 Background: ML-based mortality prediction tools in oncology can optimize clinical decisions and prompt end-of-life care discussions. Patients with advanced cancer who have engaged in Goals of Care (GoC) conversations report improved quality of life and better care alignment. However, oncologists often have overly optimistic prognoses and miss timely GoC discussions. Clinical notes are a valuable source of information, but processing and extracting data from them is time-consuming and labor-intensive. To address this issue, we have developed a machine learning application that ingests clinical notes and structured data from electronic health records (EHRs) to generate a 180-day mortality risk, prompting oncologists for GoC conversations. Methods: A predictive machine learning model was developed using data from cancer patients aged 21 and above, diagnosed between January 2016 and December 2021. Data was collected from various sources, including cancer and death registry and the EHR. By analyzing structured and unstructured data from ambulatory progress notes, a clinical profile was created for each patient. The model utilized Spark-NLP for preprocessing, applying word2vec embedding and pre-trained NER models to extract information on diseases, symptoms, procedures, treatments, and medications. Feature engineering techniques were used to select the best NLP features, combined with structured data. The model was trained using 894 patients, employing Random Forest Classifier with 10-fold cross-validation, and tested on a separate set of 43,274 patients. Performance evaluation included ROC AUC, PR AUC, and F1 Score metrics. Results: After the fine tuning, the best model showed an AUC-ROC of 0.88 on the train set and 0.75 on the test set. At a threshold of 0.44, the model achieved a balanced performance with a sensitivity of 0.70 and specificity of 0.71 on the testing set. Conclusions: Our team pioneered the development of an automated multi-modality pipeline that combines unstructured real-world data with structured data, allowing for training and testing of a fusion model. This automation opens doors for scaling and dissemination, to enhance mortality prediction. Future works will involve qualitative analysis of implementation and acceptance in clinical practice.

Read full abstract

Named entity recognition models (NER), are widely used for identifying named entities (e.g., individuals, locations, and other information) in text documents. Machine learning based NER models are increasingly being applied in privacy-sensitive applications that need automatic and scalable identification of sensitive information to redact text for data sharing. In this paper, we study the setting when NER models are available as a black-box service for identifying sensitive information in user documents and show that these models are vulnerable to membership inference on their training datasets. With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models. Our first attack capitalizes on unintended memorization in the NER's underlying neural network, a phenomenon NNs are known to be vulnerable to. Our second attack leverages a timing side-channel to target NER models that maintain vocabularies constructed from the training data. We show that different functional paths of words within the training dataset in contrast to words not previously seen have measurable differences in execution time. Revealing membership status of training samples has clear privacy implications. For example, in text redaction, sensitive words or phrases to be found and removed, are at risk of being detected in the training dataset. Our experimental evaluation includes the redaction of both password and health data, presenting both security risks and a privacy/regulatory issues. This is exacerbated by results that indicate memorization after only a single phrase. We achieved a 70% AUC in our first attack on a text redaction use-case. We also show overwhelming success in the second timing attack with an 99.23% AUC. Finally we discuss potential mitigation approaches to realize the safe use of NER models in light of the presented privacy and security implications of membership inference attacks.

Read full abstract

NER Models Research Articles

Articles published on NER Models

A Web Semantic-Based Text Analysis Approach for Enhancing Named Entity Recognition Using PU-Learning and Negative Sampling

Improving serious illness conversations in oncology: A machine learning approach that integrates natural language processing for mortality prediction.

Tuning Out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature

Local and global character representation enhanced model for Chinese medical named entity recognition

AGORA: An intelligent system for the anonymization, information extraction and automatic mapping of sensitive documents

Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-Grained Student Ensemble

Tourism Information QA Datasets for Smart Tourism Chatbot

OLID-BR: offensive language identification dataset for Brazilian Portuguese

Unintended Memorization and Timing Attacks in Named Entity Recognition Models

VisPhone: Chinese named entity recognition model enhanced by visual and phonetic features

Exploiting Morpheme and Cross-lingual Knowledge to Enhance Mongolian Named Entity Recognition

A multi-layer soft lattice based model for Chinese clinical named entity recognition

SoftNER: Mining knowledge graphs from cloud incidents

Leveraging Part-of-Speech Tagging Features and a Novel Regularization Strategy for Chinese Medical Named Entity Recognition

How Do Your Biomedical Named Entity Recognition Models Generalize to Novel Entities?

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

A Unified Multi-Task Learning Framework for Joint Extraction of Entities and Relations

Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study

CollaboNet: collaboration of deep neural networks for biomedical named entity recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

NER Models Research Articles

Articles published on NER Models

A Web Semantic-Based Text Analysis Approach for Enhancing Named Entity Recognition Using PU-Learning and Negative Sampling

Improving serious illness conversations in oncology: A machine learning approach that integrates natural language processing for mortality prediction.

Tuning Out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature

Local and global character representation enhanced model for Chinese medical named entity recognition

AGORA: An intelligent system for the anonymization, information extraction and automatic mapping of sensitive documents

Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-Grained Student Ensemble

Tourism Information QA Datasets for Smart Tourism Chatbot

OLID-BR: offensive language identification dataset for Brazilian Portuguese

Unintended Memorization and Timing Attacks in Named Entity Recognition Models

VisPhone: Chinese named entity recognition model enhanced by visual and phonetic features

Exploiting Morpheme and Cross-lingual Knowledge to Enhance Mongolian Named Entity Recognition

A multi-layer soft lattice based model for Chinese clinical named entity recognition

SoftNER: Mining knowledge graphs from cloud incidents

Leveraging Part-of-Speech Tagging Features and a Novel Regularization Strategy for Chinese Medical Named Entity Recognition

How Do Your Biomedical Named Entity Recognition Models Generalize to Novel Entities?

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

A Unified Multi-Task Learning Framework for Joint Extraction of Entities and Relations

Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study

CollaboNet: collaboration of deep neural networks for biomedical named entity recognition