Improving sequence labeling with labeled clue sentences

Qianlong Wang,Zhiyuan Wen,Keyang Ding,Qin Zhao,Min Yang,Xiaoqi Yu,Ruifeng Xu

doi:10.1016/j.knosys.2022.109828

Abstract

Pre-trained language models (PLMs) have achieved noticeable success on a variety of natural language processing tasks, such as sequence labeling. In particular, the existing sequence labeling methods fine-tune PLMs on large-scale labeled data, which can avoid training the sequence labeling models from scratch. The fine-tuning process still requires large amounts of labeled training data so as to be effective. However, obtaining rich annotated data for sequence labeling is a time-consuming and expensive process, creating a substantial barrier for directly applying the PLMs trained on general-purpose large-scale text data to sequence labeling. In this paper, we investigate sequence labeling tasks from a novel perspective and propose a general framework that uses labeled clue sentences to mitigate the problem of insufficient annotation data for sequence labeling. Specifically, we first retrieve the labeled clue sentences for each original sentence in the training set based on the semantic (or syntactic) relevance. Here, the number of annotated clue sentences determines the expansion degree of the training set. Then, we modify the transformer’s self-attention mechanism to not only exploit the contextual information of the original sentence but also leverage the contextual and label information of the labeled clue sentences. In addition, we devise a mask label strategy to further avoid over-fitting by randomly masking out the labels of certain tokens in the clue sentence and then predicting these mask labels based on the context of the tokens corresponding to the mask labels. We verify the effectiveness and generalizability of the proposed framework on three sequence labeling tasks, including Chinese Named Entity Recognition, English Named Entity Recognition, and Aspect Term Extraction. Extensive experimental results show that our method can yield state-of-the-art or competitive results on the three tasks.

Full Text