Abstract

This paper proposes a method of automatic memorabilia generation based on news reports, aiming to generate the memorabilia in a certain time period for specific enterprises or departments via machine learning technologies. Firstly, the nonparametric clustering algorithm DBSCAN is used to cluster news reports based on text similarity. Then, we propose a salience ranking model to calculate the salience score of each cluster from different aspects, such as news coverage, report forwarding and source website importance etc. Finally, time normalization and description generation are performed on the TOP-K clusters so as to generate the final memorabilia. Several experiments are carried out based on news reports crawled from the related website. Experimental results show that the proposed method can effectively discover important events from the corpus. This paper explores memorabilia generation and provides a baseline system for this task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call