Abstract

In recent years, outliers caused by manual operation errors and equipment acquisition failures often occur, bringing challenges to big data analysis. In view of the difficulties in identifying and correcting outliers of multi-source data, an intelligent identification and order-sensitive correction method of outliers from multi-data sources based on historical data mining was proposed. First, an intelligent identification method of outliers of single-source data is proposed based on neural tangent kernel K-means (NTKKM) clustering. The original data is mapped to high-dimensional feature space using Neural Tangent Kernel, where the features of outliers are acquired by K-means clustering to realize the accurate identification of outliers. Second, an order-sensitive missing value imputation framework for multi-source data (OMSMVI) was proposed. The similarity graph of sources with missing data was constructed based on multidimensional similarity analysis, and the filling order decision was transformed into an optimization problem to realize the optimal filling order decision of missing values in multi-source data. Finally, a neighborhood-based imputation (NI) algorithm is proposed. Based on the traditional KNN filling algorithm, neighboring nodes of sources with missing data are flexibly selected to the achieve accurate correction of outliers. The case experiment was operated on actual power grid data, and the results show that the proposed clustering method can identify outliers more accurately, and the determined optimal imputation sequence has higher accuracy, which provide a feasible new idea for the identification and correction of outliers in the process of data preprocessing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call