Current sequence labeling systems are typically based on CNNs-BiLSTM-CRF, which has been viewed as a standard method. However, on one hand, the standard method often uses static word embedding to embed input words without considering the polysemy. On the other hand, BiLSTM has limitations in modeling the global context information, which further makes the standard method hard to correctly recognize the label of words that are ambiguous. In this paper, we propose a multi-level topic-aware mechanism to obtain word-level and corpus-level topic representations for sequence labeling tasks. The word-level topic representation can capture different word sense to some extent and enhance the discriminative of each word. The corpus-level topic representation is a kind of global semantic information which is helpful for the understanding of a word in different contexts. Both of the word-level and corpus-level topic representations are adaptively fused with sequential information extracted by BiLSTM via a proposed fusion gate. Experimental results validate the superior performance of the proposed model compared with state-of-the-art variants of the standard methods. In particular, the multi-level topic-aware mechanism is helpful for achieving better performances on words that are ambiguous.
Read full abstract