Abstract Influenced by the diversity of data scheduling and the multi-level differentiation of data distribution, the multi-source heterogeneous data of electric power have serious deviations in space and time scheduling, which makes it difficult to identify outliers. Therefore, a method for identifying outliers of multi-source heterogeneous data of electric power based on parallel clustering and the Adaboost algorithm is proposed. A sampling node model of heterogeneous data is constructed, and random numbers that obey Gaussian distribution are generated according to the mean value of data, and their mean square loss values are optimized to form an objective function of abnormal values of multi-source heterogeneous data of electric power. According to parallel clustering and the Adaboost algorithm, a unified format is created to obtain the load control results of abnormal values of multi-source heterogeneous data of electric power, so as to realize the identification of abnormal values of multi-source heterogeneous data of electric power. The experimental results show that at the time point of 18 s, the data identification quality of this method has reached 80, and with the increase of time, its data identification quality remains stable. It is demonstrated that the method has good recognition results.
Read full abstract