Abstract
The abnormal and missing data in the original condition monitoring dataset of power equipment have adversely affected the equipment's condition assessment and fault diagnosis. This paper proposes a data cleaning method based on grey correlation analysis and ensemble learning. The condition monitoring data that to be cleaned is collected and preprocessed to achieve synchronization and standardization. The grey correlation analysis method is applied to select the parameters with high correlation degree, and the key parameter set is established, which effectively reduces the data dimension and the complexity of the model. Then a data cleaning model based on ensemble learning method (random forest) is established. After the model trained by the data in the key parameter set, the cleaning data is predicted. A distance discriminant method is used to detect abnormal data between the prediction results and measured values, and then the missing data is filled. The example shows that the method presented in this paper can identify a large number of abnormal data correctly and fill in the missing data accurately. The quality of data after cleaning is obviously improved, which benefits for data mining, condition assessment and fault diagnosis for power equipment.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have