Abstract

The named entity recognition (NER) in the field of public interest litigation can assist prosecutors in handling cases and provide them with specific entities in making legal documents. Previously, the context-free deep learning model is used to catch the semantic comprehension, in which the static word vector is obtained without considering the context. Moreover, this kind of method relies on word segmentation technology and cannot solve the error transmission caused by word segmentation inaccuracy, which brings great challenges to the Chinese NER task. To tackle the above issues, an entity recognition method based on pretraining is proposed. First, based on the basic entities, three legal ontologies, NERP, NERCGP, and NERFPP are developed to expand the named entity recognition corpus in the judicial field. Second, a variant of the pretrained model BERT (Bidirectional Encoder Representations from Transformer) called BERT-WWM (whole-word mask)-EXT(extra) is introduced to catch the text character-level word vector hierarchical and the context bidirectional features, which effectively solve the problem of task boundary division of named entities. Then, to further improve the model recognition effect, the general knowledge learned from the pretrained model is used to fit the downstream neural network BiLSTM (bi-long short-term memory), and at the end of the architecture, CRF (conditional random fields) is introduced to restrict the label relationship. Finally, the experimental results show that the proposed method is more effective than the existing methods, which reach 96% and 90% in the F1 index of NER and NERP entities, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.