Sparse Poisson Latent Block Model for Document Clustering

Melissa Ailem,Mohamed Nadif,Francois Role

doi:10.1109/tkde.2017.2681669

Abstract

Over the last decades, several studies have demonstrated the importance of co-clustering to simultaneously produce groups of objects and features. Even to obtain object clusters only, using co-clustering is often more effective than one-way clustering, especially when considering sparse high dimensional data. In this paper, we present a novel generative mixture model for co-clustering such data. This model, the Sparse Poisson Latent Block Model (SPLBM), is based on the Poisson distribution, which arises naturally for contingency tables, such as document-term matrices. The advantages of SPLBM are two-fold. First, it is a rigorous statistical model which is also very parsimonious. Second, it has been designed from the ground up to deal with data sparsity problems. As a consequence, in addition to seeking homogeneous blocks, as other available algorithms, it also filters out homogeneous but noisy ones due to the sparsity of the data. Experiments on various datasets of different size and structure show that an algorithm based on SPLBM clearly outperforms state-of-the-art algorithms. Most notably, the SPLBM-based algorithm presented here succeeds in retrieving the natural cluster structure of difficult, unbalanced datasets which other known algorithms are unable to handle effectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sparse Poisson Latent Block Model for Document Clustering

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering

Lead the way for us

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: Jul 1, 2017
Citations: 38

Similar Papers

Sparse Poisson Latent Block Model for Document Clustering (Extended Abstract)
Melissa Ailem ... Mohamed Nadif
-
Melissa Ailem, et. al.Melissa Ailem ... Mohamed Nadif
01 Apr 2018
01 Apr 2018

JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data.
Jiadong Ji ... Di He
Bioinformatics (Oxford, England) | VOL. 33
Jiadong Ji, et. al.Jiadong Ji ... Di He
05 Jun 2017
Bioinformatics (Oxford, England) | VOL. 33

Sparse Stochastic Online AUC Optimization for Imbalanced Streaming Data
Min Yang ... Ruimin Hu
-
Min Yang, et. al.Min Yang ... Ruimin Hu
01 Jan 2018
01 Jan 2018

Online AUC Optimization for Sparse High-Dimensional Datasets
Baojian Zhou ... Steven Skiena
-
Baojian Zhou, et. al.Baojian Zhou ... Steven Skiena
01 Nov 2020
01 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sparse Poisson Latent Block Model for Document Clustering

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering