Abstract

Power equipment text has the problems of multiple entity names and inadequate extraction of feature information. In view of the low accuracy of power equipment text recognition by commonly used named entity recognition methods, this paper proposes a method for power equipment domain name recognition based on BERT + BiLSTM + CRF (Bidirectional Encoder Representations from Transformers +Bi-directional Long Short-Term Memory + Conditional Random Field) model. This method carries out entity recognition based on sequence annotation for defect related entity types, including defects, equipment, components and so on. First, the method of rule and dictionary is used to distance supervision for named entity recognition and form the initial corpus. Secondly, carry out model training based on pre training BERT language model, BiLSTM model and CRF model, and obtain the CRF feature template set by word co-occurrence and co-occurrence of words in this domain. Finally, based on the results of model training to improve the recognition accuracy. The research shows that this method improves the accuracy of named entity recognition in power equipment domain. It is an effective named entity recognition method, which can provide new ideas for named entity recognition in other domains.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call