Multi-Document Summarization (MDS) is a challenging task due to the fact that multiple documents not only have extremely long inputs but may also be overlapping, complementary, or contradictory to each other. In this paper, we propose to capture complex cross-document interactions to handle lengthy inputs for better multi-document summarization. Specifically, we present MDS-MGRE, a coarse-to-fine MDS framework that introduces Multi-Granularity Relationships into an Extract-then-summarize pipeline. In the coarse-grained stage, multi-granularity embedding, heterogeneous graph construction, and MGRExtractor work together to convert redundant multi-documents into compact meta-documents. We first utilize pre-trained language model BERT to obtain semantically rich embeddings for documents at different granularities, including documents, paragraphs, sentence-sets, and sentences. Then, we construct a heterogeneous graph with 4 types of nodes (document nodes, paragraph nodes, sentence-set nodes, and sentence nodes) and corresponding connecting edges to model rich document relationships. Furthermore, we propose a novel Multi-Granularity Relationship-based Extractor (MGRExtractor) to produce meta-documents by efficiently pruning heterogeneous graphs. More precisely, it consists of 4 main modules: noise removal, redundancy removal, multi-granularity scoring, and sentence-set selection. In the fine-grained stage, we employ the large configuration of BART as our abstractive summarizer to generate system summaries from the extracted meta-documents. Experimental results on two benchmark datasets show that our framework significantly outperforms strong baselines with comparable parameters, and slightly underperforms methods with a maximum encoding length of 16,384 tokens. For Multi-News and WCEP, automatic evaluation results show that MDS-MGRE achieves an average performance improvement of 1.75% and 8.77% compared to the state-of-the-art systems with comparable parameters, respectively. Such positive results demonstrate the benefits of generating high-quality meta-documents to enhance MDS by modeling rich document relationships.
Read full abstract