Hybrid multi-document summarization using pre-trained language models

Alireza Ghadimi,Hamid Beigy

doi:10.1016/j.eswa.2021.116292

Abstract

Abstractive multi-document summarization is a type of automatic text summarization. It obtains information from multiple documents and generates a human-like summary from them. In this paper, we propose an abstractive multi-document summarization method called HMSumm. The proposed method is a combination of extractive and abstractive summarization approaches. First, it constructs an extractive summary from multiple input documents, and then uses it to generate the abstractive summary. Redundant information, which is a global problem in multi-document summarization, is managed in the first step. Specifically, the determinantal point process (DPP) is used to deal with redundancy. This step also controls the length of the input sequence for the abstractive summarization process. This step has two effects. The first is to reduce the computational time. The second is to preserve the important parts of the input documents for an abstractive summarizer. We employ a deep submodular network (DSN) to determine the quality of the sentences in the extractive summary, and use BERT-based similarities to compute the redundancy. The obtained extractive summary is fed into BART and T5 pre-trained models to generate two abstractive summaries. We use the diversity of sentences in each summary to select one of them as the final abstractive summary. To evaluate the performance of HMSumm, we use both human evaluations and ROUGE-based assessments, and compare it with several state-of-the-art methods. We use DUC 2002, DUC 2004, Multi-News, and CNN/DailyMail datasets to evaluate the algorithms. The experimental results show that HMSumm outperforms the related state-of-the-art algorithms.

Full Text