Abstract

The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step.Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.