Enhancing sentence-level clustering with ranking-based clustering framework for theme-based summarization

Libin Yang,Xiaoyan Cai,Yang Zhang,Peng Shi

doi:10.1016/j.ins.2013.11.026

Abstract

Sentence clustering plays a pivotal role in theme-based summarization, which discovers topic themes defined as the clusters of highly related sentences in order to avoid redundancy and cover more diverse information. As the length of sentences is short and the content it contains is limited, the bag-of-words cosine similarity traditionally used for document clustering is no longer reasonably suitable. Special treatment for measuring sentence similarity is necessary. In this paper, we propose a ranking-based clustering framework that utilizes ranking distribution of documents and terms to help generate high quality sentence clusters. The effectiveness of the proposed framework is demonstrated by both the cluster quality analysis and the summarization evaluation conducted on the DUC 2004 and DUC2007 datasets.

Full Text