Abstract

Clustering large and high-dimensional document data has got a great interest. However, current clustering algorithms lack efficient representation learning. Implementing deep learning techniques in document clustering can strengthen the learning processes. In this work, we simultaneously disentangle the problem of learned representation by preserving important information from the initial data while pushing the original samples and their augmentations together in one hand. Furthermore, we handle the cluster locality preservation issue by pushing neighboring data points together. To that end, we first introduce Contractive Autoencoders. Then we propose a deep embedding clustering framework based on contractive autoencoder (DECCA) to learn document representations. Furthermore, to grasp relevant document or word features, we append the Frobenius norm as penalty term to the conventional autoencoder framework, which helps the autoencoder to perform better. In this way, the contractive autoencoders apprehend the local manifold structure of the input data and compete with the representations learned by existing methods. Finally, we confirm the supremacy of our proposed algorithm over the state-of-the-art results on six real-world images and text datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call