Abstract
With the number of social media users ramping up, microblogs are generated and shared at record levels. The high momentum and large volumes of short texts bring redundancies and noises, in which the users and analysts often find it problematic to elicit useful information of interest. In this paper, we study a query-focused summarization as a solution to address this issue and propose a novel summarization framework to generate personalized online summaries and historical summaries of arbitrary time durations. Our framework can deal with dynamic, perpetual, and large-scale microblogging streams. Specifically, we propose an online microblogging stream clustering algorithm to cluster microblogs and maintain distilled statistics called Microblog Cluster Vectors (MCV). Then we develop a ranking method to extract the most representative sentences relative to the query from the MCVs and generate a query-focused summary of arbitrary time durations. Our experiments on large-scale real microblogs demonstrate the efficiency and effectiveness of our approach.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.