An Improved Extractive Summarization Technique for Bengali Text(s)

M. I. Afjal,M. A. A. Mamun,P. B. Tumpa,S. Yeasmin,M.P. Uddin,A. M. Nitu

doi:10.1109/ic4me2.2018.8465609

Abstract

At present, the text summarization has become an important tool for the user to retrieve the required information quickly. Many techniques on extractive text summarization have been developed for English text(s). However, there is a few works done for Bengali text(s) summarization. In this paper, an improved extractive Bengali text summarization technique has been proposed with enhancing the word scoring process, position value heuristics and summary procedure of the existing summarizer. In the word scoring technique, each word is preprocessed using noise removal, tokenization, stop word removal and stemming operation. Then, a heuristic to find the word score is proposed through checking it in all the input documents. Moreover, a modified heuristic is proposed for the sentence scoring in which it has given the priority to the middle sentence highest and then the upper and lower sentences from the middle sentence will be less emphasized. Finally, top k-sentences are extracted from each of the clusters of sentences and sorted the extracted sentences as their actual appearances in the original document(s). Thus, the final summary is synchronized with the original document(s). In comparison to the preceding method, the experimental result shows that the proposed technique produced better summarization to satisfy the users.

Full Text