Abstract

A variety of Chinese textual operational text data has been recorded during the operation and maintenance of the high-speed railway catenary system. Such defect text records can facilitate defect detection and defect severity analysis if mined efficiently and accurately. Therefore, in this context, this paper focuses on a specific problem in defect text mining, which is to efficiently extract defect-relevant information from catenary defect text records and automatically identify catenary defect severity. The specific task is transformed into a machine learning problem for defect text classification. First, we summarize the characteristics of catenary defect texts and construct a text dataset. Second, we use BERT to learn defect texts and generate word embedding vectors with contextual features, fed into the classification model. Third, we developed a deep text categorization network (DTCN) to distinguish the catenary defect level, considering the contextualized semantic features. Finally, the effectiveness of our proposed method (BERT-DTCN) is validated using a catenary defect textual dataset collected from 2016 to 2018 in the China Railway Administration in Chengdu, Lanzhou, and Hengshui. Moreover, BERT-DTCN outperforms several competitive methods in terms of accuracy, precision, recall, and F1-score value.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.