Abstract

Sentence clustering plays a pivotal role in theme-based summarization, which discovers topic themes defined as the clusters of highly related sentences in order to avoid redundancy and cover more diverse information. As the length of sentences is short and the content it contains is limited, the bag-of-words cosine similarity traditionally used for document clustering is no longer reasonably suitable. Special treatment for measuring sentence similarity is necessary. In this paper, we propose a ranking-based clustering framework that utilizes ranking distribution of documents and terms to help generate high quality sentence clusters. The effectiveness of the proposed framework is demonstrated by both the cluster quality analysis and the summarization evaluation conducted on the DUC 2004 and DUC2007 datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.