A Flexible Ensemble Algorithm for Big Data Cleaning of PMUs

Long Shen,Ruimin Duan,Xin He,Risheng Qin,Xian Meng,Cheng Guo,Mingqun Liu

doi:10.3389/fenrg.2021.695057

Long Shen, Ruimin Duan + Show 5 more

Open Access

https://doi.org/10.3389/fenrg.2021.695057

Copy DOI

Journal: Frontiers in Energy Research	Publication Date: Jul 27, 2021
Citations: 2	License type: CC BY 4.0

Abstract

With an increasing application of Phase Measurement Units in the smart grid, it is becoming inevitable for PMUs to operate in severe conditions, which results in outliers and missing data. However, conventional techniques take excessive time to clean outliers and fill missing data due to lacking support from a big data platform. In this paper, a flexible ensemble algorithm is proposed to implement a precise and scalable data clean by the existing big data platform “Apache Spark.” In the proposed scheme, an ensemble model based on a soft voting approach utilizes principal component analysis in conjunction with the K-means, Gaussian mixture model, and isolation forest technique to detect outliers. The proposed scheme uses a gradient boosting decision tree for each extracted feature of PMUs for the data filling process after detecting outliers. The test results demonstrate that the proposed model achieves high accuracy and recall by comparing simulated and real-world Phase measurement unit data using the local outlier factor algorithm and Density-Based Spatial Clustering of Application with Noise (DBSCAN). The mean absolute error, root mean square error and R2-score criteria are used to validate the proposed method’s data filling results against contemporary techniques such as decision tree and linear regression algorithms.

Highlights

Due to the increasing demand for accurate control and management in smart grids, many advanced online monitoring devices have been installed and provide abundant operating data resources using Phase Measurement Units (PMUs)
We adopt an ensemble method that includes three sub-detectors, the Kmeans combined with principal component analysis (PCA), Gaussian Mixture Model (GMM), and isolation forest (iForest)
An ensemble method based on sub-detector PCA-Kmeans, GMM, and the iForest algorithm is proposed in order to obtain a more accurate detection of an outlier

Summary

A Flexible Ensemble Algorithm for Big Data Cleaning of PMUs

Ensemble Algorithm for Big Data Cleaning of PMUs. With an increasing application of Phase Measurement Units in the smart grid, it is becoming inevitable for PMUs to operate in severe conditions, which results in outliers and missing data. Conventional techniques take excessive time to clean outliers and fill missing data due to lacking support from a big data platform. A flexible ensemble algorithm is proposed to implement a precise and scalable data clean by the existing big data platform “Apache Spark.”. The mean absolute error, root mean square error and R2-score criteria are used to validate the proposed method’s data filling results against contemporary techniques such as decision tree and linear regression algorithms

INTRODUCTION

N Tree

CONCLUSION