Language Model-Driven Topic Clustering and Summarization for News Articles

Peng Yang,Wenhan Li,Guangzhen Zhao

doi:10.1109/access.2019.2960538

Abstract

Topic models have been widely utilized in Topic Detection and Tracking tasks, which aim to detect, track, and describe topics from a stream of broadcast news reports. However, most existing topic models neglect semantic or syntactic information and lack readable topic descriptions. To exploit semantic and syntactic information, Language Models (LMs) have been applied in many supervised NLP tasks. However, there are still no extensions of LMs for unsupervised topic clustering. Moreover, it is difficult to employ general LMs (e.g., BERT) to produce readable topic summaries due to the mismatch between the pretraining method and the summarization task. In this paper, noticing the similarity between content and summary, first we propose a Language Model-based Topic Model (LMTM) for Topic Clustering by using an LM to generate a deep contextualized word representation. Then, a new method of training a Topic Summarization Model is introduced, where it is not only able to produce brief topic summaries but also used as an LM in LMTM for topic clustering. Empirical evaluations of two different datasets show that the proposed LMTM method achieves better performance over four baselines for JC, FMI, precision, recall and F1-score. Additionally, the generated readable and reasonable summaries also validate the rationality of our model components.

Highlights

With the rapid development of Internet technology, online media has become an important way for the public to publish and obtain information
We propose the use of Language Models (LMs), which have been applied in many supervised NLP tasks but not to unsupervised topic clustering, to overcome the limitation of traditional topic models, i.e., that they do not make full use of semantic and syntactic information, and to explain the details of Language Model-based Topic Model (LMTM) by using a novel language model, e.g., BERT
Considering that general LMs are too large to be flexible and efficient, and the pretraining of them is useless for topic clustering or summarization, we propose a text summarization model whose encoder can be used as a light and flexible LM

Summary

Introduction

With the rapid development of Internet technology, online media has become an important way for the public to publish and obtain information. Since the number of news reports is very large, and different news articles have different values, as well as gain different amounts of attention, the articles reporting important and hot events may be overshadowed by those with less value. For news related to an ongoing event, it is difficult for readers to link the current news with previous articles, which leads to them having difficulty analyzing the. The associate editor coordinating the review of this manuscript and approving it for publication was Jerry Chun-Wei Lin. development of the events. Automatically extracting hot events from massive news reports and linking them with related topics is an urgent issue [1], [2]

Objectives

Methods

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 29	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Language Model-Driven Topic Clustering and Summarization for News Articles

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Topic Structure-Aware Neural Language Model
Noriaki Kawamae
-
Noriaki KawamaeNoriaki Kawamae
13 May 2019
13 May 2019

A comparative study of topic models for topic clustering of Chinese web news
Yonghui Wu ... Jun Xu
-
Yonghui Wu, et. al. Yonghui Wu ... Jun Xu
01 Jul 2010
01 Jul 2010

Hybrid Topic Cluster Models for Social Healthcare Data
K Rajendra Prasad ... R M
International Journal of Advanced Computer Science and Applications | VOL. 10
K Rajendra Prasad, et. al.K Rajendra Prasad ... R M
01 Jan 2019
International Journal of Advanced Computer Science and Applications | VOL. 10

A Unified Framework for Feature-based Domain Adaptation of Neural Network Language Models
Michael Hentschel ... Marc Delcroix
-
Michael Hentschel, et. al.Michael Hentschel ... Marc Delcroix
01 May 2019
01 May 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Language Model-Driven Topic Clustering and Summarization for News Articles

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access