Abstract
<p>As the amount of textual Information increases, we experience a need for Automatic Text Summarizers. In Automatic summarization a text document or a larger corpus of multiple documents are reduced to a short set of words or paragraph that conveys the main meaning of the text Summarization can be classified into two approaches: extraction and abstraction. This paper focuses on extraction approach.The goal of text summarization based on extraction approach is sentences selection. The first step in summarization by extraction is the identification of important features. In our approach short stories and biographies are used as test documents. Each document is prepared by pre-processing process: sentence segmentation, tokenization, stop word removal, case folding, lemmatization, and stemming. Then, using important features, sentence filtering, data compression and finally calculating score for each sentence is done. In this paper we proposed various features of Summary Extraction and also analyzed features that are to be applied depending upon the size of the Document. The experimentation is performed with the DUC 2002 dataset. The comparative results of the proposed approach and that of MS-Word are also presented here. The concept based features are given more weightage. From these results we propose that use of the concept based features helps in improving the quality of the summary in case of large documents.</p>
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IAES International Journal of Artificial Intelligence (IJ-AI)
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.