Abstract

On-board system fault knowledge base (KB) is a collection of fault causes, maintenance methods, and interrelationships among on-board modules and components of high-speed railways, which plays a crucial role in knowledge-driven dynamic operation and maintenance (O&M) decisions for on-board systems. To solve the problem of multi-source heterogeneity of on-board system O&M data, an entity matching (EM) approach using the BERT model and semi-supervised incremental learning is proposed. The heterogeneous knowledge fusion task is formulated as a pairwise binary classification task of entities in the knowledge units. Firstly, the deep semantic features of fault knowledge units are obtained by BERT. We also investigate the effectiveness of knowledge unit features extracted from different hidden layers of the model on heterogeneous knowledge fusion during model fine-tuning. To further improve the utilization of unlabeled test samples, a semi-supervised incremental learning strategy based on pseudo labels is devised. By selecting entity pairs with high confidence to generate pseudo labels, the label sample set is expanded to realize incremental learning and enhance the knowledge fusion ability of the model. Furthermore, the model's robustness is strengthened by embedding-based adversarial training in the fine-tuning stage. Based on the on-board system's O&M data, this paper constructs the fault KB and compares the model with other solutions developed for related matching tasks, which verifies the effectiveness of this model in the heterogeneous knowledge fusion task of the on-board system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.