Abstract

Handling missing values in matrix data is an important step in data analysis. To date, many methods to estimate missing values based on data pattern similarity have been proposed. Most previously proposed methods perform missing value imputation based on data trends over the entire feature space. However, individual missing values are likely to show similarity to data patterns in local feature space. In addition, most existing methods focus on single class data, while multiclass analysis is frequently required in various fields. Missing value imputation for multiclass data must consider the characteristics of each class. In this paper, we propose two methods based on closed itemsets, CIimpute and ICIimpute, to achieve missing value imputation using local feature space for multiclass matrix data. CIimpute estimates missing values using closed itemsets extracted from each class. ICIimpute is an improved method of CIimpute in which an attribute reduction process is introduced. Experimental results demonstrate that attribute reduction considerably reduces computational time and improves imputation accuracy. Furthermore, it is shown that, compared to existing methods, ICIimpute provides superior imputation accuracy but requires more computational time.

Highlights

  • In data analysis, when there are missing values in the data, many analysis methods do not provide accurate results [1,2,3,4]

  • We describe the proposed missing value imputation methods

  • We have presented two missing value imputation methods, CIimpute and ICIimpute, based on closed itemsets for multiclass matrix data

Read more

Summary

Introduction

In data analysis, when there are missing values in the data, many analysis methods do not provide accurate results [1,2,3,4]. Handling missing values is a very important issue in data analysis [5,6,7]. There are two main approaches to handling missing values. The second approach involves imputing the values of missing data based on their similarity to data patterns of other instances [9,10]. Most previously proposed imputation methods use trends across all instances, i.e., the entire feature space, to estimate missing values. It is important to estimate missing values based on local feature space

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call