Indian Regional Language Abstractive Text Summarization using Attention-based LSTM Neural Network

Rishabh Karmakar,Ketki Nirantar,Deptii Chaudhari,Pooja Hiremath,Prathamesh Kurunkar

doi:10.1109/conit51480.2021.9498309

Abstract

Text summarization is a process of compiling a block of text into a short, precise, and understandable text which provides the complete interpretation of the original text in fewer words whilst retaining the context of the original content. Literature and texts in regional languages are often difficult to comprehend due to a lack of corresponding summaries conveying the idea of the text. Abstractive text summarization is widely studied for the English language, however, it is in nascent stages for Indian Regional languages. There is an acute paucity of regional data sets, a challenge for researchers working in this field. In this paper, we try to resolve the data set scarcity in Indian Regional Languages like Hindi and Marathi, and we have proposed two new deep learning architectures to perform text summarization using the Abstractive approach which is Attention-based and Stacked LSTM based Sequence To Sequence (Seq2Seq) Neural Network. These models are backed by Hindi and Marathi stop-words and rare words list that we have created for pre-processing. Our novel approach enables the model to accept text in Hindi and Marathi languages and to produce a succinct summary correspondingly able to explain the gist of the original text lucidly.

Full Text