Abstract

AbstractAutomatic text summarization involves extracting relevant details from the contents of input text documents for generating summaries. This area of Natural Language Processing is widely researched, especially with popular languages like English. There is a need to extend this work to less commonly spoken languages of the world. This paper presents a language-independent text summarization approach using Latent Semantic Analysis in Konkani language. Konkani is a low-resource language with limited language processing tools, stop-word list, etc. Latent Semantic Analysis (LSA) is an unsupervised algebraic method that finds latent semantic structures to be used for performing extractive text summarization. We examined well-known Latent Semantic Analysis-based sentence selection approaches on our dataset, constructed using books on Konkani folk tales written in Devanagari script. The results of the experiments indicated that LSA-based approaches can produce promising summaries, with the Cross method performing the best in most metrics.KeywordsAutomatic text summarizationLatent semantic analysisKonkaniLow-resourceSingular value decompositionExtractive text summarization

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call