Abstract
In the Query Focused Multi-Document Summarization (QF-MDS) task, a set of documents and a query are given where the goal is to generate a summary from these documents based on the given query. However, one major challenge for this task is the lack of availability of labeled training datasets. To overcome this issue, in this paper, we propose a novel weakly supervised learning approach via utilizing distant supervision. In particular, we use datasets similar to the target dataset as the training data where we leverage pre-trained sentence similarity models to generate the weak reference summary of each individual document in a document set from the multi-document gold reference summaries. Then, we iteratively train our summarization model on each single-document to alleviate the computational complexity issue that occurs while training neural summarization models in multiple documents (i.e., long sequences) at once. Experimental results on the Document Understanding Conferences (DUC) datasets show that our proposed approach sets a new state-of-the-art result in terms of various evaluation metrics.
Highlights
With the rapid growth of textual documents on the internet, accessing information from the web has become a challenging issue (Yao et al, 2017)
While using datasets similar to the target dataset as the training data for the Query Focused Multi-Document Summarization (QF-MDS) task, we find that these datasets only contain multi-document gold summaries
One of them is pre-trained for extractive summarization: PQSUMEXT; while the other is pre-trained for abstractive summarization: PQSUMABS
Summary
With the rapid growth of textual documents on the internet, accessing information from the web has become a challenging issue (Yao et al, 2017). The QF-MDS task deals with such problems where the goal is to summarize a set of documents to answer a given query. In the QF-MDS task, the summaries generated by the summarizer can be either extractive or abstractive (Yao et al, 2017; Kulkarni et al, 2020). With the rising popularity of virtual assistants in recent years, there is a growing interest to integrate abstractive summarization capabilities in these systems for natural response generation (Nishida et al, 2019)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.