Abstract

Considering the complexity of clustering text datasets in terms of informal user generated content and the fact that there are multiple labels for each data point in many informal user generated content datasets, this paper focuses on Non-negative Matrix Factorization (NMF) algorithms for Overlapping Clustering of customer inquiry and review data, which has seldom been discussed in previous literature. We extend the use of Semi-NMF and Convex-NMF to Overlapping Clustering and develop a procedure of applying SemiNMF and Convex-NMF on Overlapping Clustering of text data. The developed procedure is tested based on customer review and inquiry datasets. The results of comparing SemiNMF and Convex-NMF with a baseline model demonstrate that they have advantages over the baseline model, since they do not need to adjust parameters to obtain similarly strong clustering performances. Moreover, we compare different methods of picking labels for generating Overlapping Clustering results from Soft Clustering algorithms, and it is concluded that thresholding by mean method is a simpler and relatively more reliable method compared to maximum n method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call