Abstract

Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.

Highlights

  • In the field of data mining and machine learning, it is a common occurrence that the considered data sets contain several observations with missing feature values

  • The classical K-median algorithms have been modified based on whole data strategy (WDS), partial distance strategy (PDS), and nearest prototype strategy (NPS) to handle incomplete data sets

  • This paper considers the clustering problem for incomplete data

Read more

Summary

Introduction

In the field of data mining and machine learning, it is a common occurrence that the considered data sets contain several observations with missing feature values. Such incomplete data occur in a wide array of application domains due to various reasons, including improper collection process of data sets, high cost to obtain some feature values, and missing response in the questionnaire. Theoretical study of pattern recognition for incomplete data is first conducted by Sebestyen [2] under certain probabilistic assumptions. Empirical studies on incomplete data are reported by Dixon [4] and Jain and Dubes [5]

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call