Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach

Abdullah Alshanqiti,Abdallah Namoun,Aisha Mousa Mashraqi,Abdul Rehman Gilal,Aeshah Alsughayyir,Sami Saad Albouq

doi:10.1109/access.2021.3113256

Abdullah Alshanqiti, Abdallah Namoun + Show 4 more

Open Access

https://doi.org/10.1109/access.2021.3113256

Copy DOI

Abstract

Towards tackling the phenomenon of textual information overload that is exponentially pumping with redundancy over the Internet, this paper investigates a solution depending on the Automatic Text Summarization (ATS) method. The idea of ATS is to assist, e.g., online readers, in getting a simplified version of texts for preserving their time/effort required to skim a given large body of text. However, ATS is deemed as one of the most complex NLP applications, particularly for the Arabic language that has not been intelligently developed like the other Indo-European languages. Thus, we present an extractive-based summarizer (ArDBertSum) for text written in Arabic, relying on the DistilBERT model. Besides, we propose a domain-specific sentence-clauses segmentater (SCSAR) to support our ArDBertSum in further shortening long/complex sentences. The results of our experiments illustrate that our ArDBertSum yields the best performance, compared with non-heuristic Arabic summarizers, in producing an acceptable quality of candidate summaries. These experiments have been conducted on EASC-dataset (along with our proposed dataset) to report on (1) a statistical evaluation utilizing ROUGE metrics and (2) a specific human-based evaluation. The human evaluation results revealed promising perceptions; however, further works are needed to ameliorate the coherence and punctuation of the automatic summaries.

Highlights

In this era of digitalization, tremendous amounts of textual data and electronic documents are exponentially pumping and advance to diffuse over the Internet rapidly
Automatic Text Summarization is deemed as one of the most complex Natural Language Processing (NLP) applications, for the Arabic language that has not been intelligently developed like the other Indo-European languages
Towards producing a summarizer for text written in Arabic, relying on a pre-trained Language Understanding model (LUM), this paper has examined the ability of a fine-tuned version of DistilBERT in addressing the Arabic Automatic Text Summarization (ATS) concluded with offering a summarizer (ArDBertSum)

Summary

INTRODUCTION

In this era of digitalization, tremendous amounts of textual data and electronic documents are exponentially pumping and advance to diffuse over the Internet rapidly. Abstractive | Extractive | Hybrid Monolingual | Multilingual Domain-specific | Generic | Query-based Driven Single-document | Multi-document Indicative | Informative in details, the abstractive summarization method aims to construct new sentences (sometimes with paraphrasing technique [21]) to produce a candidate summary, relying on understanding the observed input texts, see [22]. [13] design an Arabic text summarizer that focuses on reducing the redundancy and noisy data in a given input multi-document Their underlying technique is implemented using an unsupervised score-based method. Investigate the overall utility of DistilBert model in Arabic summarization using some statistical ROUGE metrics This investigation would implicitly estimate the efficiency of the input intermediate-representation technique based on DistilBert for the Arabic texts (i.e., depends on the word-embedding method) as well as the accuracy of sentence tokenization and scoring

ArDBertSum

EXPERIMENTS AND KEY FINDINGS

17 Tourisms

PERFORMANCE OF ARDBERTSUM WITH OPTIMIZING THE INITIAL DistilBERT SUMMARY

PERFORMANCE COMPARISON WITH THE RELATED EXTRACTIVE APPROACHES

HUMAN EVALUATION

Literature

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Improving Kullback-Leibler based legal document summarization using enhanced text representation
Deepali Jain ... Malaya Dutta Borah
-
Deepali Jain, et. al.Deepali Jain ... Malaya Dutta Borah
04 Nov 2022
04 Nov 2022

Genetic Clustering Algorithm for Extractive Text Summarization
Sebastian Suarez Benjumea ... Elizabeth Leon
-
Sebastian Suarez Benjumea, et. al.Sebastian Suarez Benjumea ... Elizabeth Leon
01 Dec 2015
01 Dec 2015

Chinese Text Automatic Summarization Based on Affinity Propagation Cluster
Changwei Zhao ... Qinke Peng
-
Changwei Zhao, et. al.Changwei Zhao ... Qinke Peng
01 Jan 2009
01 Jan 2009

Text Summarization for Online and Blended Learning
Mahira Kirmani ... Gagandeep Kaur
Scalable Computing: Practice and Experience | VOL. 25
Mahira Kirmani, et. al.Mahira Kirmani ... Gagandeep Kaur
24 Feb 2024
Scalable Computing: Practice and Experience | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access