A fusion of variants of sentence scoring methods and collaborative word rankings for document summarization

Pradeepika Verma,Anshul Verma,Sukomal Pal

doi:10.1111/exsy.12960

Abstract

AbstractDocument summarization is an important task in natural language processing that helps deal with the problem of information overload occurring due to the existence of redundant content. Summary generation with highly relevant contents and maximum coverage is particularly challenging which can only be achieved when redundancy is minimized. This article introduces a novel approach for automatic text summarization based on sentence scoring and collaborative ranking to produce summaries with minimal redundancy and improved overall performance of summarization. The proposed model is a fusion of weighted and unweighted features‐based sentence scoring methods. To learn optimal weights of text features, it has been modelled as an optimization problem. Moreover, the proposed model exploits the strength of collaborative ranking to generate the summary of a given document. Three similarity factors (proximity, significance and singularity)‐based models have been employed to find the similarity between weighted and unweighted sentence scores. The results of the comparison experiment demonstrate that the proposed (PS + Jac) method generates a closer summary to the reference summary with minimal redundant contents. On average, the proposed (PS + Jac) method generates the summaries with 61% accurate contents with greater improved rates up to 40%. The statistical testing also confirms that the performance improvement is significant at a 5% level of significance.

Full Text