Abstract

As the first step of building a knowledge graph to record the ASEAN counties’ information, we aim to conduct Named-entity Recognition (NER) on the Chinese news about ASEAN counties. We employ a Bi-directional gated recurrent unit to replace the LSTM architecture to improve both models’ effectiveness and capability in understanding polysemous words. The state-of-the-art word embedding model, BERT, has also been included to generate qualified word vectors for the NER task. Besides, we also propose a similarity-based dataset partition method to help model learning the polysemy within the Chinese news. Experiments have been done to demonstrate that the combination of such improvements can benefit the models’ performance in identifying different types of named entities.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call