Abstract

In this paper, an abstractive Arabic text summarization model that is based on sequence-to-sequence recurrent neural networks is proposed. It consists of a multilayer encoder and single-layer decoder. Encoder layers utilize bidirectional long short-term memory, whereas the decoder utilizes unidirectional long short-term memory. The encoder layers are the input text layer, keywords layer and the name entities layer. Moreover, the decoder uses a global attention mechanism that considers all the input hidden states to generate the summary words. The experiments are conducted on a dataset collected from several resources. The quality of the generated summary is measured quantitatively and qualitatively. In the quantitative measure, in addition to ROUGE1, three new evaluation measures are proposed to evaluate the quality of the generated summary, called ROUGE1-NOORDER, ROUGE1-STEM and ROUGE1-CONTEXT. One of the reasons for proposing new evaluation measures is that the abstractive nature of the summary requires more context based evaluations. Another reason refers to the morphological nature of the Arabic language since several words can be generated from the same root using morphemes. Moreover, a qualitative evaluation measure that is performed by a human is used to evaluate the readability and the relevance of the generated summary since it is hard to automatically measure the readability and relevance. The experimental results show that the multilayer encoder models provide the best results, where the values of ROUGE1, ROUGE1-NOORDER, ROUGE1-STEM and ROUGE1-CONTEXT of the proposed model are 38.4, 46.2, 52.6 and 58.1, respectively. Furthermore, the qualitative evaluation shows that the proposed model is the best, achieving an average readability and relevant value equal to 75.9%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call