Abstract

In the modern day the information age makes available instant access to knowledge that would have been difficult or impossible to find previously. This leads to the problem of exploring too many user reviews on popular products for potential buyers to spend adequate time to read and extract the most salient product details and opinions of previous buyers. Multi-Document Summarization, the process of abbreviating the content of a collection of documents into a short, singular summary, is an important tool for expressing the content of these source while maintaining a level of linguistic quality. We can take advantage of the summarization approach to reduce the time necessary to read a collection of topically-related user reviews to locate their desired information needs by viewing a short summary of these reviews. In this paper, we propose a new summarizer that utilizes both extractive and abstractive techniques to yield such a high-quality summary. Our hybrid summarization approach is novel, since it includes the development of an innovative extractive summarizer that simply employes the KL-divergence measure to quantify sentences that capture the major contents of a set of user reviews on a particular product and retains these sentences in a summary that contains the most salient information of the set of reviews. Hereafter, the extractive summary is passed on to an adapted BERT deep learning model to improve the linguistic quality of the original summary and generate a new shorter text that conveys the most critical information from the original one. The proposed hybrid summarization approach is simple and easy to understand. Our summarizer has been compared against three existing summarizers and shown to outperform each one of them in terms of both expressing relevant content and linguistic features of a set of user reviews.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call