Corpus-based web document summarization using statistical and linguistic approach

Rushdi Shams,Afrina Hossain,Monika Gope,Suraiya Rumana Akter,M M A Hashem

doi:10.1109/iccce.2010.5556854

Abstract

Single document summarization generates summary by extracting the representative sentences from the document. In this paper, we presented a novel technique for summarization of domain-specific text from a single web document that uses statistical and linguistic analysis on the text in a reference corpus and the web document. The proposed summarizer uses the combinational function of Sentence Weight (SW) and Subject Weight (SuW) to determine the rank of a sentence, where SW is the function of number of terms (t_n) and number of words (w_n) in a sentence, and term frequency (t_f) in the corpus and SuW is the function of t_n and w_n in a subject, and t_f in the corpus. 30 percent of the ranked sentences are considered to be the summary of the web document. We generated three web document summaries using our technique and compared each of them with the summaries developed manually from 16 different human subjects. Results showed that 68 percent of the summaries produced by our approach satisfy the manual summaries.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Corpus-based web document summarization using statistical and linguistic approach

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Novel Technique for Efficient Text Document Summarization as a Service
Anusha Bagalkotkar ... S Sowmya Kamath
-
Anusha Bagalkotkar, et. al.Anusha Bagalkotkar ... S Sowmya Kamath
01 Aug 2013
01 Aug 2013

Extractive Text and Video Summarization using TF-IDF Algorithm
Ajinkya Gothankar ... Samiksha Nehe
International Journal for Research in Applied Science and Engineering Technology | VOL. 10
Ajinkya Gothankar, et. al.Ajinkya Gothankar ... Samiksha Nehe
31 Mar 2022
International Journal for Research in Applied Science and Engineering Technology | VOL. 10

TEXT SUMMARIZING SYSTEM OF ENGLISH SUBJECTS AND TEXT MINING SUBJECTS FOR COMPUTER SCIENCE STUDENTS
...
Journal of critical reviews | VOL. 7
, et. al. ...
01 Mar 2020
Journal of critical reviews | VOL. 7

Blending Sentence Optimization Weights of Unsupervised Approaches for Extractive Speech Summarization
Noraini Seman ... Nursuriati Jamil
Procedia Computer Science | VOL. 51
Noraini Seman, et. al.Noraini Seman ... Nursuriati Jamil
01 Jan 2015
Procedia Computer Science | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Corpus-based web document summarization using statistical and linguistic approach

Abstract

Talk to us

Similar Papers