Topical document clustering: two-stage post processing technique

Divyansh Bhatia,N Mehala,Poonam Goyal,Navneet Goyal

doi:10.1504/ijdmmm.2018.10013658

Abstract

Clustering documents is an essential step in improving efficiency and effectiveness of information retrieval systems. We propose a two-phase split-merge (SM) algorithm, which can be applied to topical clusters obtained from existing query-context-aware document clustering algorithms, to produce soft topical document clusters. The SM is a post-processing technique which combines the advantages of document and feature-pivot topical document clustering approaches. The split phase splits the topical clusters by relating them to the topics obtained by disambiguating web search results, and converts them into homogeneous soft clusters. In the merge phase, similar clusters are merged by feature-pivot approach. The SM is tested on the outcome of two hierarchical query-context aware document clustering algorithms on different datasets including TREC session-track 2011 dataset. The obtained topical-clusters are also updated by an incremental approach with the progress in the data stream. The proposed algorithm improves the quality of clustering appreciably in all the experiments conducted.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Topical document clustering: two-stage post processing technique

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Mining, Modelling and Management

Lead the way for us

Similar Papers

How many performance measures to evaluate information retrieval systems?
Alain Baccini ... Laetitia Lafage
Knowledge and Information Systems | VOL. 30
Alain Baccini, et. al.Alain Baccini ... Laetitia Lafage
12 Apr 2011
Knowledge and Information Systems | VOL. 30

CHMM Object Detection Based on Polygon Contour Features by PSM.
Shufang Zhuo ... Yanwei Huang
Sensors (Basel, Switzerland) | VOL. 22
Shufang Zhuo, et. al.Shufang Zhuo ... Yanwei Huang
30 Aug 2022
Sensors (Basel, Switzerland) | VOL. 22

A Grey Wolf Optimizer for Text Document Clustering
Hasan Rashaideh ... Laith Mohammad Abualigah
Journal of Intelligent Systems | VOL. 29
Hasan Rashaideh, et. al.Hasan Rashaideh ... Laith Mohammad Abualigah
21 Jul 2018
Journal of Intelligent Systems | VOL. 29

Incremental blind feedback
Jiaul H Paik ... Swapan K Parui
ACM Transactions on Asian Language Information Processing | VOL. 13
Jiaul H Paik, et. al.Jiaul H Paik ... Swapan K Parui
03 Oct 2014
ACM Transactions on Asian Language Information Processing | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Topical document clustering: two-stage post processing technique

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Mining, Modelling and Management