Automatic Text Summarization of Konkani Texts Using Latent Semantic Analysis

Jovi D’Silva,Chaitali More,Uzzal Sharma

doi:10.1007/978-981-19-2821-5_37

Abstract

AbstractAutomatic text summarization involves extracting relevant details from the contents of input text documents for generating summaries. This area of Natural Language Processing is widely researched, especially with popular languages like English. There is a need to extend this work to less commonly spoken languages of the world. This paper presents a language-independent text summarization approach using Latent Semantic Analysis in Konkani language. Konkani is a low-resource language with limited language processing tools, stop-word list, etc. Latent Semantic Analysis (LSA) is an unsupervised algebraic method that finds latent semantic structures to be used for performing extractive text summarization. We examined well-known Latent Semantic Analysis-based sentence selection approaches on our dataset, constructed using books on Konkani folk tales written in Devanagari script. The results of the experiments indicated that LSA-based approaches can produce promising summaries, with the Cross method performing the best in most metrics.KeywordsAutomatic text summarizationLatent semantic analysisKonkaniLow-resourceSingular value decompositionExtractive text summarization

Full Text