Abstract

Automatic text summarization is an exertion of contriving the abridged form of a text document covering salient knowledge. Numerous statistical, linguistic, rule-based, and position-based text summarization approaches have been explored for different rich-resourced languages. For under-resourced languages such as Hindi, automatic text summarization is a challenging task and still an unsolved problem. Another issue with such languages is the unavailability of corpus and the inadequacy of the processing tools. In this paper, we proposed an extractive lexical knowledge-rich topic modeling text summarization approach for Hindi novels and stories in which we implemented four independent variants using different sentence weighting schemes. We prepared a corpus of Hindi Novels and stories since the absence of a corpus. We used a smoothing technique for edifying and variety summaries followed by evaluating the efficacy of generated summaries against three metrics (gist diversity, retention ratio, and ROUGE score). The results manifest that the proposed model produces abridge, articulate and coherent summaries. To investigate the performance of the proposed model, we simulate the experiments on the English dataset as well. Further, we compare our models with the baselines and traditional topic modeling approach, where we show that the proposed model has confessed optimal results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.