Abstract

Traditional outlier detection methods assume that the sampling time and interval are the same. However, for plant-wide processes, since the signal change rate of different devices may vary by several orders of magnitude, the measured data in real-world systems usually have different sampling rates, resulting in missing data. To achieve reliable outlier detection, a missing data probability estimation-based Bayesian outlier detection method is adopted. In this strategy, the expectation–maximization (EM) algorithm is first used to estimate the likelihood probability of different evidence under different process statuses by using the history dataset which contains complete and incomplete samplings. Secondly, the realization of unavailable parts in the monitoring point is estimated as a probability through historical data and online moving horizon data. Bayesian theory and likelihood probability are then used to calculate the outlier posterior probability of different realization. Finally, the outlier probability of the monitoring sampling is calculated by the probability of different realizations and the corresponding outlier probability. Using the Tennessee Eastman (TE) dataset, a simulation indicates that the proposed method exhibits a significant improvement over the complete data method.

Highlights

  • Outlier detection, an important research topic in data mining, has attracted wide attention in academic and applied fields

  • Considering the shortcomings of traditional methods, based on the multisampling rates of plant-wide processes, missing data probability estimation-based Bayesian outlier detection is adopted here. In this strategy, considering the computing complexity of plant-wide processes, given that both the historical data and online horizon data are multisampling rates with incomplete data, the research includes four aspects: (1) to reduce complexity, variables with the same sampling period are placed in a sub-block, and PCA is performed for each sub-block to form monitoring evidence; (2) marginalization-based probability estimation for realization of current incomplete evidence is executed through historical multisampling rate samples and online moving horizon data; (3) the EM

  • Due to the complexity of plant-wide processes, systems usually feature multiple sampling rates, while the traditional outlier detection method typically assumes that the sampling time and interval are the same

Read more

Summary

Introduction

An important research topic in data mining, has attracted wide attention in academic and applied fields. The deleting method is first proposed, in which the records with missing data are deleted directly to obtain a complete dataset This method is easy to implement, a significant amount of effective information will be deleted if the amount of missing data is large, and the real-time accuracy of outlier detection will be affected if only the complete sampling is used for modeling and detection. Because all of these methods assign a definite value to missing data and classify a sample as being normal or an outlier directly, a wrong imputation value or wrong classification will significantly affect subsequent analysis and processing. To avoid such errors, probabilistic estimation is a good alternative.

Problem Statement and Motivation Analysis
Bayesian Outlier Detection for Plant-Wide Processes with Multisampling Rates
Marginalization-Based Realization Estimation
Expectation–Maximization-Based Likelihood Probability Estimation
Bayesian and Full Probability-Based Outlier Detection
Simulation and Application
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.