Abstract

Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss about multi-document summarization that differs from the single one in which the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Since the number and variety of online medical news make them difficult for experts in the medical field to read all of the medical news, an automatic multi-document summarization can be useful for easy study of information on the web. Hence we propose a new approach based on machine learning meta-learner algorithm called AdaBoost that is used for summarization. We treat a document as a set of sentences, and the learning algorithm must learn to classify as positive or negative examples of sentences based on the score of the sentences. For this learning task, we apply AdaBoost meta-learning algorithm where a C4.5 decision tree has been chosen as the base learner. In our experiment, we use 450 pieces of news that are downloaded from different medical websites. Then we compare our results with some existing approaches.

Highlights

  • Nowadays there are lots of online medical news on the web and study of these huge amount of information is not possible for experts in medical field [1]

  • We present a machine learning based model for a sentence extraction based, Multi document, and informative text summarization in the medical domain (This work is an improvement of the study proposed in [5])

  • We treat a document as a set of sentences, which must be classified as positive or negative examples of sentences based on the summary worthiness of sentences where a sentence is represented by a feature set, which includes a number of features used in the summarization literature and some other features specific to the medical domain

Read more

Summary

Introduction

Nowadays there are lots of online medical news on the web and study of these huge amount of information is not possible for experts in medical field [1]. We present a machine learning based model for a sentence extraction based, Multi document, and informative text summarization in the medical domain (This work is an improvement of the study proposed in [5]). We treat a document as a set of sentences, which must be classified as positive or negative examples of sentences based on the summary worthiness of sentences where a sentence is represented by a feature set, which includes a number of features used in the summarization literature and some other features specific to the medical domain. The first and better understood effect of boosting is that it generates a hypothesis whose error on the training set is small by combining many hypotheses whose error may be large (but still better than random guessing) It seems that boosting may be helpful to learning problems having either of the following two properties.

Related Work
Summarization Method
Using AdaBoost for Sentence Extraction
Summary generation
Summary Generation
Experimental Results
Proposed Method Mead

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.