Abstract
Usually, time series data suffers from high percentage of missing values which is related to its nature and its collection process. This paper proposes a data imputation technique for imputing the missing values in time series data. The Fuzzy Gaussian membership function and the Fuzzy Triangular membership function are proposed in a data imputation algorithm in order to identify the best imputation for the missing values where the membership functions were used to calculate weights for the data values of the nearest neighbor’s before using them during imputation process. The evaluation results show that the proposed technique outperforms traditional data imputation techniques where the triangular fuzzy membership function has shown higher accuracy than the gaussian membership function during evaluation.
Highlights
In computer science field, the data quality problem began to rise in the 1990s with arise of the data warehouse systems where the failure of a database project was returned to its poor data quality. [1] There is a lot of definitions for the word “data quality” but as mentioned in [2] there is a well-known definition used by a lot of researchers which is “fitness for use”
These data quality dimensions consist of timelines to ensure that the value is new, consistency to ensure that representation of the data is unchanging in all cases, completeness to ensure that the data is completed with no missing values, and accuracy to ensure that the recorded value is identical with the actual value. [1]
The paper introduced two proposed techniques based on the fuzzy logic while imputing missing values in time series data
Summary
The data quality problem began to rise in the 1990s with arise of the data warehouse systems where the failure of a database project was returned to its poor data quality. [1] There is a lot of definitions for the word “data quality” but as mentioned in [2] there is a well-known definition used by a lot of researchers which is “fitness for use”. The Missing at random (MAR): Variable is missing at random where the probability of missingness is depending only on an available information This type can be named as missing conditionally which means missing with a condition; for an example if gender is male, they will leave questions related to women in the survey empty. This paper aims to ensure the data quality of time series data It aims to ensure the completeness dimensions of the time series data that suffers from missing value. Towards this aim, two novel techniques for imputing the missing values in time series data are proposed and compared with traditional techniques. Evaluation Results shows that the two proposed techniques have higher accuracy than the traditional data imputing techniques.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Advanced Computer Science and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.