Abstract

ABSTRACT In the process of high-speed train operation, numerous text-based on-board log data are recorded by on-board safety computers. Machine learning methods can be used to help technicians make correct fault diagnosis decisions using this on-board log reasonably. The imbalance of on-board log data affects the performance of fault diagnosis, resulting in lower accuracy of fault class. To address this problem, this work proposes a fault diagnosis method for on-board equipment based on imbalanced text classification. First, the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm is used to realize text feature extraction and vector transformation of on-board log data, and then an improved bagging ensemble model based on Kernel Extreme Learning Machine (KELM) is established. This model establishes the ensemble classifier in the bagging framework, and the KELM is used as the basic classifier. By random under-sampling of majority class samples to create balanced subsets, and subsets are used to train the base classifiers. An imbalanced classification problem is converted into several balanced classification problems, which ensures the diversity of basic classifiers and improves the recognition effect of the fault class. The experiment and analysis of on-board log data of a railway bureau show that the model can improve the accuracy, recall, precision, F-measure, ROC, and AUC of fault diagnosis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call