Abstract
We propose a new clustering method based on a deep neural network. Given an unlabeled dataset and the number of clusters, our method directly groups the dataset into the given number of clusters in the original space. We use a conditional discrete probability distribution defined by a deep neural network as a statistical model. Our strategy is first to estimate the cluster labels of unlabeled data points selected from a high-density region, and then to conduct semi-supervised learning to train the model by using the estimated cluster labels and the remaining unlabeled data points. Lastly, by using the trained model, we obtain the estimated cluster labels of all given unlabeled data points. The advantage of our method is that it does not require key conditions. Existing clustering methods with deep neural networks assume that the cluster balance of a given dataset is uniform. Moreover, it also can be applied to various data domains as long as the data is expressed by a feature vector. In addition, it is observed that our method is robust against outliers. Therefore, the proposed method is expected to perform, on average, better than previous methods. We conducted numerical experiments on five commonly used datasets to confirm the effectiveness of the proposed method.
Highlights
Clustering is one of the oldest machine-learning fields, where the objective is, given data points, to group them into clusters according to some measure
This special type of SC is named as Selective Geodesic Spectral Clustering (SGSC), which we propose for assisting Spectral Embedded Deep Clustering (SEDC) as well
We propose a deep clustering method named SEDC
Summary
Clustering is one of the oldest machine-learning fields, where the objective is, given data points, to group them into clusters according to some measure. Thanks to the development of deep neural networks, we can handle large datasets with complicated shapes [9]. One major direction in the studies is to combine deep AutoEncoders (AE) [10] with classical clustering methods This AE is used to obtain a clustering friendly low dimensional representation. Another major direction is directly grouping a given unlabeled dataset into the clusters in the original input space by employing a deep neural network to model the distribution of cluster labels. With both directions, there exist popular methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.