Abstract
It is a necessary but challenging task to relieve users from the proliferative news information and allow them to quickly and comprehensively master the information of the whats and hows that are happening in the world every day. In this article, we develop a novel approach of multimedia news summarization for searching results on the Internet, which uncovers the underlying topics among query-related news information and threads the news events within each topic to generate a query-related brief overview. First, the hierarchical latent Dirichlet allocation (hLDA) model is introduced to discover the hierarchical topic structure from query-related news documents, and a new approach based on the weighted aggregation and max pooling is proposed to identify one representative news article for each topic. One representative image is also selected to visualize each topic as a complement to the text information. Given the representative documents selected for each topic, a time-bias maximum spanning tree (MST) algorithm is proposed to thread them into a coherent and compact summary of their parent topic. Finally, we design a friendly interface to present users with the hierarchical summarization of their required news information. Extensive experiments conducted on a large-scale news dataset collected from multiple news Web sites demonstrate the encouraging performance of the proposed solution for news summarization in news retrieval.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have