Abstract

Extracting/abstracting the condensed form of original text document by retaining its information and complete meaning is known as text summarization. The creation of manual summaries from large text documents is difficult and time-consuming for humans. Text summarization has become an important and challenging area in natural language processing. This paper presents a heuristic appraoch to extract a summary of e-news articles of the Telugu language. The method proposes new lexical parameter-based information extraction (IE) rules for scoring the sentences. Event score and Named Entity Score is a novel part in sentence scoring to identify the essential information in the text. Depending on the frequency of occurrence of event/named entites in the sentence and document, sentences are selected for summary. Data is collected from online news sources (i.e., Eenadu, Sakshi,Andhra Jyothi, Namaste Telangana) to experiment. The proposed method is compared with other techniques developed for Telugu text summarization. Evaluation metrics like precision, recall, and F1 score is used to measure the proposed method's performance. An extensive statistical and qualitative evaluation of the system's summaries has been conducted using Recall-Oriented Understudy for Gisting Evaluation (ROUGE), a standard summary evaluation tool. The results showed improved performance compared to other methods.

Highlights

  • For English, several advancements are made in the field of Text Summarization but not for Indian languages

  • This paper proposed an improved sentence ranking approach to generates effective summaries for Telugu text socuments based on occurrences of events and named entities in the text

  • This paper proposed a heuristic-based method of extractive text summarization with an improved sentence ranking mechanism for Telugu text documents

Read more

Summary

Introduction

For English, several advancements are made in the field of Text Summarization but not for Indian languages. Telugu is an agglutinative language, due to which text summarizations developed for other Indian languages like Hindi, Bengali does not support Telugu. Text summarization plays a role in mining the significant sentences to generate the summary of the entire document. Depending on the type of summary, Text summarization methods are broadly classified into extractive/abstractive. Abstractive summarization methods interpret the source document and rewrite the sentences to obtain summaries. This paper proposed an improved sentence ranking approach to generates effective summaries for Telugu text socuments based on occurrences of events and named entities in the text. The rest paper's sequencing is as follows: Section 2 explains the literature of various summarization techniques developed for Indian languages.

2.Related Work
Summary
4.Expreriment Results and Disucssion
5.Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call