Abstract

Time series forecasting has become an important aspect of data analysis and has many real-world applications. However, undesirable missing values are often encountered, which may adversely affect many forecasting tasks. In this study, we evaluate and compare the effects of imputation methods for estimating missing values in a time series. Our approach does not include a simulation to generate pseudo-missing data, but instead perform imputation on actual missing data and measure the performance of the forecasting model created therefrom. In an experiment, therefore, several time series forecasting models are trained using different training datasets prepared using each imputation method. Subsequently, the performance of the imputation methods is evaluated by comparing the accuracy of the forecasting models. The results obtained from a total of four experimental cases show that the -nearest neighbor technique is the most effective in reconstructing missing data and contributes positively to time series forecasting compared with other imputation methods.

Highlights

  • The recent emergence of cutting-edge computing technology such as the internet of things (IoT) and big data, has resulted in a new era in which large-scale data can be generated, collected, and exploited

  • This section introduces the concept of missing data imputation in the time series and the imputation methods used in the experiments

  • We evaluated the effects of imputation methods for replacing missing values with estimated values

Read more

Summary

Introduction

The recent emergence of cutting-edge computing technology such as the internet of things (IoT) and big data, has resulted in a new era in which large-scale data can be generated, collected, and exploited. CMC, 2022, vol., no.1 numerous missing values often coexist within such rich data. These missing values are considered as major obstacles in data analysis because they distort the statistical properties of the data and reduce availability. When obtaining data from a questionnaire, many respondents are likely to intentionally omit a response to a question that is difficult to answer. As another example, when collecting data measured by machines or computer systems, various types of missing values can occur owing to mechanical defects or system malfunctions. The primary types of missing values identified in previous studies related to the field of statistics are as follows:

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.