Abstract

We propose a new clustering method based on a deep neural network. Given an unlabeled dataset and the number of clusters, our method directly groups the dataset into the given number of clusters in the original space. We use a conditional discrete probability distribution defined by a deep neural network as a statistical model. Our strategy is first to estimate the cluster labels of unlabeled data points selected from a high-density region, and then to conduct semi-supervised learning to train the model by using the estimated cluster labels and the remaining unlabeled data points. Lastly, by using the trained model, we obtain the estimated cluster labels of all given unlabeled data points. The advantage of our method is that it does not require key conditions. Existing clustering methods with deep neural networks assume that the cluster balance of a given dataset is uniform. Moreover, it also can be applied to various data domains as long as the data is expressed by a feature vector. In addition, it is observed that our method is robust against outliers. Therefore, the proposed method is expected to perform, on average, better than previous methods. We conducted numerical experiments on five commonly used datasets to confirm the effectiveness of the proposed method.

Highlights

  • Clustering is one of the oldest machine-learning fields, where the objective is, given data points, to group them into clusters according to some measure

  • This special type of SC is named as Selective Geodesic Spectral Clustering (SGSC), which we propose for assisting Spectral Embedded Deep Clustering (SEDC) as well

  • We propose a deep clustering method named SEDC

Read more

Summary

Introduction

Clustering is one of the oldest machine-learning fields, where the objective is, given data points, to group them into clusters according to some measure. Thanks to the development of deep neural networks, we can handle large datasets with complicated shapes [9]. One major direction in the studies is to combine deep AutoEncoders (AE) [10] with classical clustering methods This AE is used to obtain a clustering friendly low dimensional representation. Another major direction is directly grouping a given unlabeled dataset into the clusters in the original input space by employing a deep neural network to model the distribution of cluster labels. With both directions, there exist popular methods.

Direct Methods
Related Works
Existing Clustering Methods Using Deep Neural Network
Related Techniques with Proposed Method
Spectral Clustering
Virtual Adversarial Training
Proposed Method
Selective Geodesic Spectral Clustering
Spectral Embedded Deep Clustering
Computational and Space Complexity of SEDC
Numerical Experiments
Datasets and Evaluation Metric
Performance Evaluation of SGSC
Performance Evaluation of SEDC
Method
Conclusions and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.