Unsupervised query-focused multi-document summarization based on transfer learning from sentence embedding models, BM25 model, and maximal marginal relevance criterion

Salima Lamsiyah,Abdelkader El Mahdaouy,Said Ouatik El Alaoui,Bernard Espinasse

doi:10.1007/s12652-021-03165-1

Abstract

Extractive query-focused multi-document summarization (QF-MDS) is the process of automatically generating an informative summary from a collection of documents that answers a pre-given query. Sentence and query representation is a fundamental cornerstone that affects the effectiveness of several QF-MDS methods. Transfer learning using pre-trained word embedding models has shown promising performance in many applications. However, most of these representations do not consider the order and the semantic relationships between words in a sentence, and thus they do not carry the meaning of a full sentence. In this paper, to deal with this issue, we propose to leverage transfer learning from pre-trained sentence embedding models to represent documents’ sentences and users’ queries using embedding vectors that capture the semantic and the syntactic relationships between their constituents (words, phrases). Furthermore, BM25 and semantic similarity function are linearly combined to retrieve a subset of sentences based on their relevance to the query. Finally, the maximal marginal relevance criterion is applied to re-rank the selected sentences by maintaining query relevance and minimizing redundancy. The proposed method is unsupervised, simple, efficient, and requires no labeled text summarization training data. Experiments are conducted using three standard datasets from the DUC evaluation campaign (DUC’2005–2007). The overall obtained results show that our method outperforms several state-of-the-art systems and achieves comparable results to the best performing systems, including supervised deep learning-based methods.

Full Text