Abstract

Traditional clustering algorithms, such as K-Means, output a clustering that is disjoint and exhaustive, i.e., every single data point is assigned to exactly one cluster. However, in many real-world datasets, clusters can overlap and there are often outliers that do not belong to any cluster. While this is a well-recognized problem, most existing algorithms address either overlap or outlier detection and do not tackle the problem in a unified way. In this paper, we propose an intuitive objective function, which we call the NEO-K-Means (Non-Exhaustive, Overlapping K-Means) objective, that captures the issues of overlap and non-exhaustiveness in a unified manner. Our objective function can be viewed as a reformulation of the traditional K-Means objective, with easy-to-understand parameters that capture the degrees of overlap and non-exhaustiveness. By considering an extension to weighted kernel K-Means, we show that we can also apply our NEO-K-Means idea to overlapping community detection, which is an important task in network analysis. To optimize the NEO-K-Means objective, we develop not only fast iterative algorithms but also more sophisticated algorithms using low-rank semidefinite programming techniques. Our experimental results show that the new objective and algorithms are effective in finding ground-truth clusterings that have varied overlap and non-exhaustiveness; for the case of graphs, we show that our method outperforms state-of-the-art overlapping community detection algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.