Abstract

The Lanzhou‐Xinjiang (Lan‐Xin) high‐speed railway is one of the principal sections of the railway network in western China, and signal equipment is of great importance in ensuring the safe and efficient operation of the high‐speed railway. Over a long period, in the railway operation and maintenance process, the railway signaling and communications department has recorded a large amount of unstructured text information about equipment faults in the form of natural language. However, due to irregularities in the recording methods of these data, it is difficult to use directly. In this paper, a method based on natural language processing (NLP) was adopted to analyze and classify this information. First, the Latent Dirichlet Allocation (LDA) topic model was used to extract the semantic features of the text, which were then expressed in the corresponding topic feature space. Next, the Support Vector Machine (SVM) algorithm was used to construct a signal equipment fault diagnostic model that reduced the impact of sample data imbalance on the classification accuracy. This was compared and analyzed with the traditional Naive Bayes (NB), Logistic Regression (LR), Random Forest (RF), and K‐Nearest Neighbor (KNN) algorithms. This study used signal equipment failure text data from the Lan‐Xin high‐speed railway to conduct experimental analysis and verify the effectiveness of the proposed method. Experiments showed that the accuracy of the SVM classification algorithm could reach 0.84 after being combined with the LDA topic model, which verifies that the natural language processing method can effectively realize the fault diagnosis of signal equipment and has certain guiding significance for the maintenance of field signal equipment.

Highlights

  • Railway signal equipment mainly includes railway signals, station interlocking equipment, and section blocking equipment. e main function of these types of equipment is to ensure the safety of train operation and shunting work and increase the capacity of the railway

  • Precision is usually used to evaluate the performance of a classifier. e precision rate represents the proportion of samples that are positive to the samples predicted by the model to be positive

  • Experimental Analysis of Fault Diagnosis Algorithm. rough a comparative analysis of the above experiments, it can be seen that the method of fault diagnosis of railway signal equipment based on the Latent Dirichlet Allocation (LDA) and Support Vector Machine (SVM) models is better than the established method of combining word feature space with various classifiers

Read more

Summary

Introduction

Railway signal equipment mainly includes railway signals, station interlocking equipment, and section blocking equipment. e main function of these types of equipment is to ensure the safety of train operation and shunting work and increase the capacity of the railway. Based on the analysis of the features of the fault record text, this study used an algorithm based on machine learning and natural language processing and the method of the LDA topic model to extract the word item features and the theme features of the corresponding fault in the fault description text for railway signal equipment. It transformed the corresponding fault document into a theme feature space model, which reduced the dimension of the feature effectively and made the reduced dimension data easier to process and use. Because the recorded fault data had the characteristics of unbalanced distribution, the Support Vector Machine (SVM) classifier was selected to classify the faults. e SVM classifier is not sensitive to unbalanced distribution data and is recognized as one of the most effective models for processing small data samples

Fault Text Analysis of Railway Signal Equipment
Experimental Analysis
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.