Abstract

On social media, entity linking is very important for natural language processing tasks, such as Sentiment Analysis, Question Answering (QA) and Machine Translation. Compared to English-oriented entity linking, Chinese entity linking has its special difficulties. Just like the entity linking for short text, Chinese microblogs have lots of noise and the mention lacks effective context information. In order to solve these problems, we present a new model for Chinese microblogs entity linking. Entity linking usually includes two steps: candidate entities generation and candidate entities ranking. First, based on the characteristics of Chinese, we put forward multi-method fusion strategies for candidate generation to improve the recall rate of candidate entities. Second, we propose a new neural network model called TAS (Topic attention Siamese) for candidate entities ranking. In TAS model, we add effective topic semantics on Siamese network to learn representations of context, mention and entity, and rank the mention-entity similarity. The representation of mention incorporates information from multiple sentences on the same topic, which can effectively solve the problem of the lack of contextual information. We also use Character-enhanced Word Embedding model (CWE) to pre-train both word embedding and characters embedding to work out noise and word segmentation impact. Experimental results demonstrate that our method significantly outperforms the state-of-the-art results for entity linking on Chinese social media.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call