Abstract
In deep learning-based Gesar epic named entity recognition, vectorized representation is a central and crucial step. While the traditional syllable vector representation is too homogeneous, which leads to the optimal performance of the downstream task. To address this problem, this paper proposes the BERT-BILSTM-CRF method with Tibetan syllables as the basic unit. BERT, as a multi-layer representation learning, allows for enhanced semantic representation of Tibetan syllables and dynamic generation of syllable vectors based on contextual features through representation learning of Tibetan syllables, and thus more accurate identification of Gesar epic named entities. What is shown experimentally is that the method works well on the Gesar Classic corpus of the Four Descending Histories. The accuracy, recall and F-values were 98.56%, 98.67% and 98.11% respectively, and it was elicited that the line to entity ratio of the Gesar epic was 3:1, involving an equal number of complete lines and entities of the epic. This reflects the fact that the Gesar epic is a very rich dataset of named entities in Tibetan texts, thus demonstrating that named entities are part of the appeal of the epic.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.