Abstract

Automatic text summarization is a technique which compresses large text into a shorter text which includes the important information. Hindi is the top-most language used in India and also in a few neighboring countries there is a lack of proper summarization system for Hindi text. Hence,in this paper, we present an approach to the design an automatic text summarizer for Hindi text that generates a summary by extracting sentences. It deals with a single document summarization based on machine learning approach. Each sentence in the document is represented by a set of various features namelysentence paragraph position, sentence overall position, numeric data, presence of inverted commas, sentence length and keywords in sentences. The sentences are classified into one of four classes namelymost important, important, less important and not important. The classes are in turn having ranks from 4 to 1 respectively with “4”indicating most important sentence and “1” being least relevant sentence . Next a supervised machine learning tool SVM rank is used to train the summarizer to extract important sentences, based on the feature vector. The sentences are ordered according to the ranking of classes. Then based on the required compression ratio, sentences are included in the final summary. The experiment was performed on news articles of different category such as bollywood, politics and sports. The performance of the technique is compared with the human generated summaries. The average result of experiments indicates 72% accuracy at 50% compression ratio and 60% accuracy at 25% compression ratio.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.