Natural Language Processing For Automatic text summarization [Datasets] - Survey

Alaa Ahmed Al-Banna,Abeer K Al-Mashhadany

doi:10.31185/wjcm.72

Alaa Ahmed Al-Banna, Abeer K Al-Mashhadany

Open Access

https://doi.org/10.31185/wjcm.72

Copy DOI

Abstract

Natural language processing has developed significantly recently, which has progressed the text summarization task. It is no longer limited to reducing the text size or obtaining helpful information from a long document only. It has begun to be used in getting answers from summarization, measuring the quality of sentiment analysis systems, research and mining techniques, document categorization, and natural language Inference, which increased the importance of scientific research to get a good summary. This paper reviews the most used datasets in text summarization in different languages and types, with the most effective methods for each dataset. The results are shown using text summarization matrices. The review indicates that the pre-training models achieved the highest results in the summary measures in most of the researchers' works for the datasets. Dataset English made up about 75% of the databases available to researchers due to the extensive use of the English language. Other languages such as Arabic, Hindi, and others suffered from low resources of dataset sources, which limited progress in the academic field.

Full Text