Abstract

The collection of reliable and high-quality data is seen as a prerequisite for effective and efficient rail infrastructure and rolling stock asset management to meet the requirements of asset owners and service providers. In this paper, the importance of recovering missing information in railway asset management is highlighted, and the advanced models and algorithms that have been applied to recovering the missed data are analyzed and discussed. Through making comparisons among these models and algorithms, a procedure is proposed to guide selecting the appropriate models based on different data missing scenarios. Using the newly developed framework with one dataset from each scenario, new models with different structures are trained and finally, the most suitable model is selected and utilized to recover the missing data and the selected model's performance is evaluated using the data with known or clearly identified missing data mechanisms. Challenges via application of advanced algorithms for recovering missing data are discussed.

Highlights

  • Missing data occurs frequently within asset condition monitoring and needs to be understood when asset condition predictions and calculations are being undertaken [1]

  • Any real-world data collection system for asset condition monitoring is likely to have missing at random (MAR) and missing completely at random (MCAR)

  • Missing data can be described from three different aspects [15]: a) Missing completely at random (MCAR) where the data is missing independently of both observed and unobserved data; b) Missing at random (MAR) where the data does not depend upon relevance to the hypothesis being tested; c) Missing not at random (MNAR) where missing observations are related to values of unobserved data

Read more

Summary

INTRODUCTION

Missing data occurs frequently within asset condition monitoring and needs to be understood when asset condition predictions and calculations are being undertaken [1]. Missing data can be described from three different aspects [15]: a) Missing completely at random (MCAR) where the data is missing independently of both observed and unobserved data; b) Missing at random (MAR) where the data does not depend upon relevance to the hypothesis being tested; c) Missing not at random (MNAR) where missing observations are related to values of unobserved data These three types of missing data are shown in the diagram given, which is modified from [16] for asset condition data analysis. For any initial analysis of the data, there is a requirement for an assessment metric for model selection to determine the impact of the missing data and the goodness of fit for different imputation methods (e.g., machine learning, linear discriminant analysis). Other examples for the goodness of fit criterion include the chi-square test [33]

SUMMARY AND COMPARISON OF ASPECTS OF MISSING DATA
MODEL DEVELOPMENT
SIMPLE NEURAL NETWORK MODEL EXAMPLE
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call