Abstract
Electric power audit text classification is one of the important research problem in electric power systems. Recently, kinds of automatic classification methods for these texts based on machine learning or deep learning models have been applied. At present, the development of computing technology makes “pre-training and fine-tuning” the newest paradigm of text classification, which achieves better results than previous fully-supervised models. Based on pre-training theory, domain-related pre-training tasks can enhance the performance of downstream tasks in the specific domain. However, existing pre-training models usually use general corpus for pre-training, and do not use texts related to the field of electric power, especially electric power audit texts. This results in that the model does not learn too much electric-power-related morphology or semantics in the pre-training stage, so that less information can be used in the fine-tuning stage. Based on the research status, in this paper, we propose EPAT-BERT, a BERT-based model pre-trained by two-granularity pre-training tasks: word-level masked language model and entity-level masked language model. These two tasks predict word and entity in electric-power-related texts to learn abundant morphology and semantics about electric power. We then fine-tune EPAT-BERT for electric power audit text classification task. The experimental results show that, compared with fully supervised machine learning models, neural network models, and general pre-trained language models, EPAT-BERT can significantly outperform existing models in a variety of evaluation metrics. Therefore, EPAT-BERT can be further applied to electric power audit text classification. We also conduct ablation studies to prove the effectiveness of each component in EPAT-BERT to further illustrate our motivations.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.