A Consensus Approach to Improve NMF Document Clustering

Mickael Febrissy,Mohamed Nadif

doi:10.1007/978-3-030-44584-3_14

Mickael Febrissy, Mohamed Nadif

Open Access

https://doi.org/10.1007/978-3-030-44584-3_14

Copy DOI

Abstract

Nonnegative Matrix Factorization (NMF) which was originally designed for dimensionality reduction has received throughout the years a tremendous amount of attention for clustering purposes in several fields such as image processing or text mining. However, despite its mathematical elegance and simplicity, NMF has exposed a main issue which is its strong sensitivity to starting points, resulting in NMF struggling to converge toward an optimal solution. On another hand, we came to explore and discovered that even after providing a meaningful initialization, selecting the solution with the best local minimum was not always leading to the one having the best clustering quality, but somehow a better clustering could be obtained with a solution slightly off in terms of criterion. Therefore in this paper, we undertake to study the clustering characteristics and quality of a set of NMF best solutions and provide a method delivering a better partition using a consensus made of the best NMF solutions.

Highlights

When dealing with text data, document clustering techniques allow to divide a set of documents into groups so that documents assigned to the same group are more similar to each other than to documents assigned to other groups [12,18,21,22]
This hypothesis can be used at different stages in the information retrieval process, the two most notable being: cluster-based retrieval to speed up search, and search result clustering to help users navigate and understand what is in the search results
By using cluster ensembles, we have proposed a simple method to obtain a better clustering for the scope of Nonnegative Matrix Factorization (NMF) algorithms on text data

Summary

Introduction

When dealing with text data, document clustering techniques allow to divide a set of documents into groups so that documents assigned to the same group are more similar to each other than to documents assigned to other groups [12,18,21,22]. On real data already labeled, many papers evaluate the performance of clustering algorithms by relying on indices such as Accuracy (ACC), Normalized Mutual Information (NMI) [25] and Adjusted Rand Index (ARI) [14]. Are iterative and require several initializations; the resulting partition is the one optimizing the objective function Sometimes in these works, we observe comparative studies between methods on the basis of maximum ACC/NMI/ARI measures obtained after several initializations and not optimizing the criterion used in the algorithm. This remark leads us to consider an ensemble method that is widely used in supervised learning [11,24] but a little less in unsupervised learning [25] If this approach, referred to as consensus clustering, is often used in the context of comparing partitions obtained with different algorithms, it is less studied considering the same algorithm.

Nonnegative Matrix Factorization

Experiments

Datasets

NMF Raw Performances and Initialization

Consensus Clustering

Consensus Multinomial

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Consensus Approach to Improve NMF Document Clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 1	License type: CC BY 4.0

Similar Papers

Advances in Nonnegative Matrix and Tensor Factorization
A Cichocki ... R Zdunek
Computational Intelligence and Neuroscience | VOL. 2008
A Cichocki, et. al.A Cichocki ... R Zdunek
01 Jan 2008
Computational Intelligence and Neuroscience | VOL. 2008

Multi-Channel Non-Negative Matrix Factorization for Overlapped Acoustic Event Detection
Panagiotis Giannoulis ... Gerasimos Potamianos
-
Panagiotis Giannoulis, et. al.Panagiotis Giannoulis ... Gerasimos Potamianos
01 Sep 2018
01 Sep 2018

Source separation using regularized NMF with MMSE estimates under GMM priors with online learning for the uncertainties
Emad M Grais ... Hakan Erdogan
Digital Signal Processing | VOL. 29
Emad M Grais, et. al.Emad M Grais ... Hakan Erdogan
12 Mar 2014
Digital Signal Processing | VOL. 29

Nonnegative Matrix Factorization: A Review
Abdul Bin Ismail
Recent Research Reviews Journal | VOL. 2
Abdul Bin Ismail Abdul Bin Ismail
01 Sep 2023
Recent Research Reviews Journal | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Consensus Approach to Improve NMF Document Clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers