Soft document clustering using a novel graph covering approach

Jens Dörpinghaus,Marc Jacobs,Sebastian Schaaf

doi:10.1186/s13040-018-0172-x

Jens Dörpinghaus, Marc Jacobs + Show 1 more

Open Access

https://doi.org/10.1186/s13040-018-0172-x

Copy DOI

Abstract

BackgroundIn text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation.ResultsIn this paper we present and discuss a novel graph-theoretical approach for document clustering and its application on a real-world data set. We will show that the well-known graph partition to stable sets or cliques can be generalized to pseudostable sets or pseudocliques. This allows to perform a soft clustering as well as a hard clustering. The software is freely available on GitHub.ConclusionsThe presented integer linear programming as well as the greedy approach for this mathcal {NP}-complete problem lead to valuable results on random instances and some real-world data for different similarity measures. We could show that PS-Document Clustering is a remarkable approach to document clustering and opens the complete toolbox of graph theory to this field.

Highlights

In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics
For the clustering use case, we study MEDLINE abstracts and associated metadata that are processed by ProMiner, a named entity recognition system ([38]), and indexed by the semantic information retrieval platform SCAIView ([39])
We have shown a novel approach for document clustering considering hard clustering as well as soft clustering

Summary

Introduction

Document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation. Soft Document Clustering using a graph partition in multiple pseudostable sets has been introduced in [1]. We would like to extend this approach by making some fundamental theoretical additions, discuss the correct calculation of the bounds and ι and discuss some output data. Document Clustering ( known as Text Clustering) is a specific application of textmining and a sub-problem of cluster analyses. The approach discussed in this paper can be applied to other clustering subjects, but the purpose of text clustering is the most common. The application of Document Clustering is a wide and open field and in terms of complexity it is still under heavy research, see [2, 65ff] and [3, 47]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BioData Mining	Publication Date: Jun 14, 2018
Citations: 7	License type: open-access

R Discovery Prime

R Discovery Prime

Soft document clustering using a novel graph covering approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioData Mining

Lead the way for us

Similar Papers

Document Clustering using a Graph Covering with Pseudostable Sets
Jens Dörpinghaus ... Marc Jacobs
-
Jens Dörpinghaus, et. al.Jens Dörpinghaus ... Marc Jacobs
24 Sep 2017
24 Sep 2017

Soft and hard hybrid balanced clustering with innovative qualitative balancing approach
Seyed Alireza Mousavian Anaraki ... Abdorrahman Haeri
Information Sciences | VOL. 613
Seyed Alireza Mousavian Anaraki, et. al.Seyed Alireza Mousavian Anaraki ... Abdorrahman Haeri
27 Sep 2022
Information Sciences | VOL. 613

Local quality functions for graph clustering with non-negative matrix factorization.
Twan Van Laarhoven ... Elena Marchiori
Physical Review E | VOL. 90
Twan Van Laarhoven, et. al.Twan Van Laarhoven ... Elena Marchiori
29 Dec 2015
Physical Review E | VOL. 90

From Lab to Real World: Assessing the Effectiveness of Human Activity Recognition and Optimization through Personalization.
Marija Stojchevska ... Femke Ongenae
Sensors | VOL. 23
Marija Stojchevska, et. al.Marija Stojchevska ... Femke Ongenae
09 May 2023
Sensors | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Soft document clustering using a novel graph covering approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioData Mining