Abstract

Named entity recognition is a fundamental task in natural language processing, which aims to identify potential entities such as person, place, and organization in the text. Identifying names in ancient Chinese literature is helpful to discover Chinese traditional culture and promote traditional spirit. Unfortunately, there are two main problems for named entity recognition task in the field of ancient Chinese literature: (1) A scarcity of available annotated corpus has led to little research in this area. (2) Most existing work only focus on character embedding, resulting in limited performance. This is because character vector is difficult to consider the relevance of characters and words when processing Chinese texts, especially ancient Chinese texts. To tackle the above problems, we first introduce the distant supervision method to construct the required annotated dataset, and then propose a boundary detection enhanced named entity recognition model based on BERT+CRF. The proposed framework is proved to be effective through comparative experiments and achieve the best F1 value of 81.24%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call