Abstract

This work investigates summarizing the conversations that occur in the comments section of the UK newspaper the Guardian. In the comment summarization task comments are clustered and ranked within the cluster. The top comments from each cluster are used to give an overview of that cluster. It was found that topic model clustering gave the most agreement when evaluated against a human gold standard. This approach is compared to cosine distance clustering and k-means clustering. PageRank was found to be the prefered ranking system when compared with TF-IDF, Mutual Information gain and Maximal Marginal Relevance and evaluated against sets of comments summarized by a journalist for the Guardian letters page.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call