Abstract

Datasets with multiple variables are useful to identify trends promptly that can be used to support planning and decision making. However, it is quite common for these datasets to miss observations, in turn influencing the quality of the predictions. This is a challenging problem that is attracting increasing research. Although several imputation methods exist, most imputing methodologies are centred around a specific application, and these approaches fall short of desired goals when applied to time series datasets. In this sense, this paper describes a method for using evolving Dynamic Bayesian Networks by an analytical threshold for dealing with data imputation in time series datasets. Our method smoothly converges to a robust data imputation model and constantly adapts to the arrival of new data, obtaining more reliable network models. We conducted tests using real and simulated datasets. To insert missing values, we consider missing completely at random (MCAR) and missing not at random (MNAR). For each dataset, 10%, 20%, 30% and 40% missing rate was used. For each missing data rate, we inserted the missing values in 40% of the total intervals of each dataset. We benchmarked the results against widely used imputation methods on time series datasets. There is a significant improvement with reductions of values observed for imputation performance metrics NRMSE, MAE and MedAE compared with other existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call