Abstract

Automatic related work generation is a new challenge in multi-document scientific summarization focusing on refining a related work section for a given scientific paper. In this paper, we propose a brand new framework ToC-RWG for related work generation by incorporating topic model and citation information. We present an unsupervised generative probabilistic model, called QueryTopicSum, which utilizes a LDA-style model to characterize the generative process of both the scientific paper and its reference papers. We also take advantage of citations of reference papers to identify Cited Text Spans (CTS) from reference papers. This approach provides us with a perspective of annotating the importance of the reference papers from the academic community. With QueryTopicSum and the identified CTS as candidate sentences, an optimization framework based on minimizing KL divergence is exerted to select the most representative sentences for related work generation. Our evaluation results on a set of 50 scientific papers along with their corresponding reference papers show that ToC-RWG achieves a considerable improvement over generic multi-document summarization and scientific summarization baselines.

Highlights

  • The related work section is a significant component of a scientific paper

  • Hoang et al [1] carried out the selection of sentences by an artificial hierarchy topic tree; Hu and Wan [2] utilized a global optimization framework for related work generation; Wang et al [3] considered the contextual relevance within the target paper and the references among kinds of objects such as papers, authors and keywords

  • In this paper, we propose a novel model ToC-RWG for automatic related work generation, which explores the combination of topic model and citation information

Read more

Summary

INTRODUCTION

The related work section is a significant component of a scientific paper. Scholars need to contextualize their work in the related research scope and highlight their contributions . Hoang et al [1] carried out the selection of sentences by an artificial hierarchy topic tree; Hu and Wan [2] utilized a global optimization framework for related work generation; Wang et al [3] considered the contextual relevance within the target paper and the references among kinds of objects such as papers, authors and keywords. [3] developed a neural data-driven summarizer with a joint context-driven attention mechanism to generate related work section They constructed a directed graph containing heterogeneous relations among kinds of objects such as papers, authors, keywords, and venues, and designed an attention mechanism focusing on the contextual relevance within the target paper being written and the graph. All the above methods leave the connection between the target paper and its reference papers out of consideration, which is exactly our breakthrough point

BAYESIAN APPROACHES IN SUMMARIZATION
MULTI-DOCUMENT SCIENTIFIC SUMMARIZATION
QUERYTOPICSUM
EXPERIMENTS AND RESULTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call