Multivariate real-life time series data often contain missing values. These missing values often affect subsequent prediction tasks. Traditional imputation methods generally consider only some of the characteristics of multivariate time series data. This can easily lead to inaccurate filling results. In this paper, a feature correlation-based bidirectional recurrent network (BRNN-FR) is proposed to solve the problem of missing values in multivariate sequence data. First, this method involves the design of a bidirectional prediction network based on time intervals and the use of forward and reverse time series information between data points to obtain the characteristics of data changes with time to the greatest extent. Second, considering the correlation between features, a combined feature selection strategy based on the Pearson correlation coefficient and mutual information was proposed. A multiple regression model was established to predict between features. Finally, a bidirectional network ensemble filling algorithm based on the relationships between features is established to predict missing values. Comprehensive experiments on four public datasets show that the mean absolute error (MAE), root mean square error (RMSE) and maximum R2 value (R2_score) of the BRNN-FR algorithm in the direct imputation test are better than those of the other comparison methods in most cases. BRNN-FR also achieved a better area under the curve (AUC) in the indirect comparison experiment of two classifications of in-hospital death after filling the medical dataset. Using the AIR air quality dataset and the power transformer temperature dataset from the ETTH1 interpolation regression to predict the next 3hours and 6hours of average numerical results, most of the optimal regression results are obtained.
Read full abstract