Abstract

Many research problems involving medical texts have limited amounts of annotated data available (e.g., expressions of rare diseases). Traditional supervised machine learning algorithms, particularly those based on deep neural networks, require large volumes of annotated data, and they underperform when only small amounts of labeled data are available. Few-shot learning (FSL) is a category of machine learning models that are designed with the intent of solving problems that have small annotated datasets available. However, there is no current study that compares the performances of FSL models with traditional models (e.g., conditional random fields) for medical text at different training set sizes. In this paper, we attempted to fill this gap in research by comparing multiple FSL models with traditional models for the task of named entity recognition (NER) from medical texts. Using five health-related annotated NER datasets, we benchmarked three traditional NER models based on BERT-BERT-Linear Classifier (BLC), BERT-CRF (BC) and SANER; and three FSL NER models-StructShot & NNShot, Few-Shot Slot Tagging (FS-ST) and ProtoNER. Our benchmarking results show that almost all models, whether traditional or FSL, achieve significantly lower performances compared to the state-of-the-art with small amounts of training data. For the NER experiments we executed, the F1-scores were very low with small training sets, typically below 30%. FSL models that were reported to perform well on non-medical texts significantly underperformed, compared to their reported best, on medical texts. Our experiments also suggest that FSL methods tend to perform worse on data sets from noisy sources of medical texts, such as social media (which includes misspellings and colloquial expressions), compared to less noisy sources such as medical literature. Our experiments demonstrate that the current state-of-the-art FSL systems are not yet suitable for effective NER in medical natural language processing tasks, and further research needs to be carried out to improve their performances. Creation of specialized, standardized datasets replicating real-world scenarios may help to move this category of methods forward.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call