An optimized hybrid deep learning model based on word embeddings and statistical features for extractive summarization

Yaser M Wazery,Marwa E Saleh,Abdelmgeid A Ali

doi:10.1016/j.jksuci.2023.101614

Yaser M Wazery, Marwa E Saleh + Show 1 more

Open Access

https://doi.org/10.1016/j.jksuci.2023.101614

Copy DOI

Abstract

Extractive summarization has recently gained significant attention as a classification problem at the sentence level. Most current summarization methods rely on only one way of representing sentences in a document (i.e., extracted features, word embeddings, BERT embeddings). However, classification performance and summary generation quality will be improved if we combine two ways of representing sentences. This paper presents a novel extractive text summarization method based on word embeddings and statistical features of a single document. Each sentence is encoded using a Convolutional Neural Network (CNN) and a Feed-Forward Neural Network (FFNN) based on word embeddings and statistical features. CNN and FFNN outputs are concatenated to classify the sentence using a Multilayer Perceptron (MLP). In addition, hybrid model parameters are optimized by the KerasTuner optimization technique to determine the most efficient hybrid model. The proposed method was evaluated on the standard Newsroom dataset. Experiments show that the proposed method effectively captures the document’s semantic and statistical information and outperforms deep learning, machine learning, and state-of-the-art approaches with scores of 78.64, 74.05, and 72.08 for ROUGE-1 ROUGE-2, and ROUGE-L, respectively.

Full Text