Abstract
Due to their individual advantages, the integration of lexicon information and pre-trained models like BERT has been widely adopted in Chinese sequence labeling tasks. However, given their high demand for training data, efforts have been made to enhance their performance in low-resource scenarios. Currently, certain specialized domains, such as agriculture, the industrial sector, and the metallurgical industry, suffer from a scarcity of data. Consequently, there is a dearth of effective models for entity relationship recognition when faced with limited data availability. Inspired by this, we constructed a suitable small balanced dataset and proposed a based-domain-NER model. Firstly, we construct a domain-specific dictionary based on mine hoist equipment and fault text and generate a dictionary tree to obtain word vector information. Secondly, we use a Lexicon Adapter to obtain the vector information of the domain-specific dictionary feature words matched using the characters and calculate the weights between their word vectors, integrating position encoding to enhance the positional information of the word vectors. Finally, we incorporate word vector information into the feature extraction layer to enhance the boundary information of domain entities and mitigate the semantic loss problem caused via using only character feature representation. Experimental results on a manually annotated dataset of mine hoist fault texts show that this method outperforms BiLSTM, BiLSTM-CRF, BERT, BERT-BiLSTM-CRF, and LEBERT, effectively improving the accuracy of named entity recognition (NER) for mine hoist faults.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.