Using Text Segmentation to Enhance the Cluster Hypothesis

Sylvain Lamprier,Bernard Levrat,Tassadit Amghar,Frédéric Saubion

doi:10.1007/978-3-540-85776-1_7

Abstract

An alternative way to tackle Information Retrieval, called Passage Retrieval, considers text fragments independently rather than assessing global relevance of documents. In such a context, the fact that relevant information is surrounded by parts of text deviating from the interesting topic does not penalize the document. In this paper, we propose to study the impact of the consideration of these text fragments on a document clustering process. The use of clustering in the field of Information Retrieval is mainly supported by the cluster hypothesis which states that relevant documents tend to be more similar one to each other than to non-relevant documents and hence a clustering process is likely to gather them. Previous experiments have shown that clustering the first retrieved documents as response to a user's query allows the Information Retrieval systems to improve their effectiveness. In the clustering process used in these studies, documents have been considered globally. Nevertheless, the assumption stating that a document can refer to more than one topic/concept may have also impacts on the document clustering process. Considering passages of the retrieved documents separately may allow to create more representative clusters of the addressed topics. Different approaches have been assessed and results show that using text fragments in the clustering process may turn out to be actually relevant.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Using Text Segmentation to Enhance the Cluster Hypothesis

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

The cluster hypothesis in information retrieval
Oren Kurland
-
Oren KurlandOren Kurland
28 Jul 2013
28 Jul 2013

Discrepancy-Based Method for Hierarchical Distributed Optimization
Jonathan Gaudreault ... Jean-Marc Frayret
-
Jonathan Gaudreault, et. al.Jonathan Gaudreault ... Jean-Marc Frayret
01 Oct 2007
01 Oct 2007

Document Length Normalization by Statistical Regression
Sylvain Lamprier ... Tassadit Amghar
-
Sylvain Lamprier, et. al.Sylvain Lamprier ... Tassadit Amghar
01 Oct 2007
01 Oct 2007

RETRACTED ARTICLE: Hybrid tolerance rough fuzzy set with improved monkey search algorithm based document clustering
Torki Altameem ... Mohammed Amoon
Journal of Ambient Intelligence and Humanized Computing | VOL. 15
Torki Altameem, et. al.Torki Altameem ... Mohammed Amoon
31 Aug 2018
Journal of Ambient Intelligence and Humanized Computing | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Text Segmentation to Enhance the Cluster Hypothesis

Abstract

Talk to us

Similar Papers