Abstract
Due to the lack of explicit markers in Chinese text to define the boundaries of words, it is often more difficult to identify named entities in Chinese than in English. At present, the pretreatment of the character or word vector models is adopted in the training of the Chinese named entity recognition model. Aimed at the problems that taking character vector as an input of the neural network cannot use the words’ semantic meanings and give up the words’ explicit boundary information, and taking the word vector as an input of the neural network relies on the accuracy of the segmentation algorithms, a Chinese named entity recognition model based on character word vector fusion CWVF-BiLSTM-CRF (Character Word Vector Fusion-Bidirectional Long-Short Term Memory Networks-Conditional Random Field) is proposed in this paper. First, the Word2Vec is used to obtain the corresponding dictionaries of character-character vector and word-word vector. Second, the character-word vector is integrated as the input unit of the BiLSTM (Bidirectional Long-Short Term Memory) network, and then, the problem of an unreasonable tag sequence is solved using the CRF (conditional random field). By using the presented model, the dependence on the accuracy of the word segmentation algorithm is reduced, and the words’ semantic characteristics are effectively applied. The experimental results show that the model based on character-word vector fusion improves the recognition effect of the Chinese named entity.
Highlights
In a broad sense, the purpose of named entity recognition (NER) is to recognize the named entity in the text and classify it into the corresponding entity types
(2) The character-word vector fusion is key to the Chinese named entity recognition, and we propose a way to process the vector by fusing the character vector and the word vector which the character is contained
In order to search for the optimal structure of the named entity recognition model, this experiment performs a tuning experiment on common parameters that affect the performance of the model
Summary
The purpose of named entity recognition (NER) is to recognize the named entity in the text and classify it into the corresponding entity types. Lample et al [7] used BiLSTM to extract character-level features, which were fused with the word vectors in dictionaries to form the final input vector, and the BiLSTM and the CRF model were combined to do the named entity recognition, which has achieved good results in English, German, Spanish, and other testing corpus. Both the methods proposed by Ma and Hovy and Lample et al leveraged the word vector to do named entity recognition in foreign language corpus, during which the accuracy of word segmentation needed not to be considered, but the accuracy of word segmentation in Chinese corpus cannot be avoided.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.