Abstract

Traditional methods for identifying naming ignore the correlation between named entities and lose hierarchical structural information between the named entities in a given text. Although traditional named-entity methods are effective for conventional datasets that have simple structures, they are not as effective for sports texts. This paper proposes a Chinese sports text named-entity recognition method based on a character graph convolutional neural network (Char GCN) with a self-attention mechanism model. In this method, each Chinese character in the sports text is regarded as a node. The edge between the nodes is constructed using a similar character position and the character feature of the named-entity in the sports text. The internal structural information of the entity is extracted using a character map convolutional neural network. The hierarchical semantic information of the sports text is captured by the self-attention model to enhance the relationship between the named entities and capture the relevance and dependency between the characters. The conditional random fields classification function can accurately identify the named entities in the Chinese sports text. The results conducted on four datasets demonstrate that the proposed method improves the F-Score values significantly to 92.51%, 91.91%, 93.98%, and 95.01%, respectively, in comparison to the traditional naming methods.

Highlights

  • Named-entity recognition refers to the identification of entities with a specific meaning.This includes the name and place associated with a person or the name of an organization from a large amount of unstructured or structured text

  • I; other non-entity words the initial entity unit is labeled as B; the internal entity unit is I; other non-entity words are designated are designated nameisofSPER; an athlete is Sport PER (SPER); a teamtoisas referred as Steam; and an organization as O; the nameasofO; anthe athlete a team is referred

  • In order to further capture the correlations between the characters in the text, between the characters and the named entities, and between the entity character locations, a multi-head self-attention mechanism was developed in the bidirectional long-short-term memory (Bi-LSTM) layer [27,28]

Read more

Summary

Introduction

Named-entity recognition refers to the identification of entities with a specific meaning. To extract the relevant named entities from unstructured texts of drug compounds, Pei et al [10] added models with an attention mechanism to the combined framework of bidirectional short and long memory networks and CRF to enhance the weights of key features in a text. The deep learning algorithms used by the researchers cited above eliminates the errors caused by artificial participation in setting features to improve the accuracy of named-entity recognition Most of these algorithms are based on simple word embedding technologies such as. This loses the semantic information of the named entities in the text and ignores the hierarchical information between them To solve these problems, this paper proposes a character-level graph convolutional self-attention network (CGCN-SAN) based on the Chinese sports text named-entity recognition method.

Feature Learning
Network
Methodology
Named-entity
Datasets
Experimental Environment and Algorithm Parameters
Impact of Features on Identification Framework Performance
Comparison with Other Identification Frameworks
Comparison with Other Studies
The Effect of Layers on the Performance
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.