Abstract

Automatic text summarization schemes are indeed helpful for glancing briefly at the text document. With this motivation, we introduce here a two-stage hybrid model for text summarization task by utilizing the strength of various approaches. In the first step, we cluster the sentences of a document according to their similarity using a partitional clustering algorithm. We then use a linear combination of the normalized Google distance and word mover’s distance to differentiate two sentences. The notion of gap statistics is exploited to approximate the number of partitions for the given document needed in the partitional clustering algorithm. We extract the significant sentences from each cluster (partition), which are recognized by their adjusted text feature scores, in the second step. The teaching–learning based optimization approach is used to find the optimal weights for the text features whereas a fuzzy inference system with a full-fledged knowledge base generated by humans is employed to determine the final score of the sentences. Moreover, we have also proposed an exact method to give a solution for the summarization problem by modeling it as an Integer Linear Programming (ILP) problem. We evaluate the proposed methods on three different datasets: DUC 2001, DUC 2002, and CNN. The observed results on these standard datasets manifest the efficacy of the proposed methods. We further show that partitioning a document in an optimal number of clusters plays a major role in content coverage in summaries. The performance of the proposed hybrid method shows that the combination of fuzzy, evolutionary, and clustering algorithms produces good summaries of the documents.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call