Named entity recognition (NER) based on deep neural networks has shown competitive performance when trained on large-scale human-annotated data. However, they face challenges in low-resource settings, where the available labeled data are scarce. A typical solution is pseudo-labeling which assigns pseudo-labels to the certain (i.e., high confidence) tokens of unlabeled sentences while discards the uncertain (i.e., low confidence) ones. But there still have two potential challenges: (1) discarding the uncertain tokens leads to low utilization of unlabeled data; (2) the intrinsic quality-quantity trade-off issue of pseudo-labeling with confidence threshold. In this work, we propose an innovative method named Uncertainty-Aware Contrastive Learning (UACL) for semi-supervised named entity recognition. Specifically, UACL first utilizes a Gaussian-based class-wise token separation mechanism to dynamically distinguish certain and uncertain tokens, which can self-adaptively adjust the confidence threshold to balance the quantity and quality of pseudo-labeled certain tokens. Then we perform pseudo-supervised learning based on certain tokens and contrastive learning based on uncertain ones, which not only improves the utilization of unlabeled data, but also provides uncertainty-aware guidance information for model training. Furthermore, our method leverages uncertain tokens to optimize token representation, leading to improving performance. The extensive experimental results on four benchmarks demonstrate that the performance of our proposed approach surpasses that of previously leading low-resource baselines.
Read full abstract