Pipeline companies are facing challenges in maintaining the integrity and reliability of their pipelines. They are working towards predictive maintenance using machine learning-based approaches to predicting anomalies. Training machine learning models requires sufficient data. Data quality is therefore becoming important because inaccurate data will lead to an inaccurate or wrong decision on pipeline condition assessment and the following management. This research paper intends to address the data quality issues of pipeline inspection data such as in-line inspection (ILI) data using machine learning models. Different machine learning models developed by random forest regression, linear regression, and nearest neighbors’ methods were tested to detect outliers in the ILI data. In this paper, the ILI data collected from an oil pipeline over a period of 22 years was applied to testing and analysis. To verify the outlier detection results of machine learning models, we used statistical analysis including Z-score method to check and find if there are any gaps in the analysis. It verifies that all these methods show almost the same or very similar results for the detection of the outliers. Hence, this study presents a robust method for the field applications in the pipeline industry.
Read full abstract