Abstract

There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms cannot be used for incomplete data sets directly because objects with missing values need to be preprocessed. In this paper, we present a new imputation algorithm for incomplete data and a three-way ensemble clustering algorithm based on the imputation result. In the proposed imputation algorithm, the objects with nonmissing values are firstly clustered by using hard clustering methods. For each missing objects, the mean attribute's value of each cluster are used to fill the missing attribute's value, respectively. Perturbation analysis of cluster centroid is applied to search the optimal imputation. As an application of proposed imputation method, we develop a three-way ensemble clustering algorithm by using the ideas of clustering ensemble and three-way decision. The objects with the same cluster label in different clustering results are assigned the core region of corresponding cluster while the objects with different clustering labels are assigned to the fringe region. Therefore, a three-way clustering is naturally formed. The experimental results on UCI data sets can verify that the algorithm is effective in revealing cluster structures.

Highlights

  • As a new idea of artificial intelligence in recent years, granular computing [1]–[3] is relatively modern theory in simulating human being’s thinking and problem solving

  • As an application of proposed imputation method, we develop a three-way ensemble clustering algorithm based on the ideas of clustering ensemble and three-way decision

  • The objects with the same cluster label in different clustering results are assigned the core region of corresponding cluster while the objects with different clustering labels are assigned to the fringe region

Read more

Summary

INTRODUCTION

As a new idea of artificial intelligence in recent years, granular computing [1]–[3] is relatively modern theory in simulating human being’s thinking and problem solving. Yu et al [16] proposed a three-way decision clustering algorithm for incomplete data. We use ensemble clustering to present a new three-way clustering algorithm for incomplete datasets. As for incomplete datasets, the single clustering algorithm cannot achieve a good clustering result because of a large number of missing data. Chen: Three-Way Ensemble Clustering for Incomplete Data we use ensemble clustering technique to combine multiple clustering results into a probably better one in this paper. Based on the above discussion, we present a three-way ensemble clustering algorithm for incomplete data in this paper. As an application of proposed imputation method, we develop a three-way ensemble clustering algorithm based on the ideas of clustering ensemble and three-way decision.

PRELIMINARIES
EXPERIMENTAL ILLUSTRATION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call