Modeling and predicting the popularity of online news based on temporal and content-related features

Steven Van Canneyt,Thomas Demeester,Bart Dhoedt,Philip Leroux

doi:10.1007/s11042-017-4348-z

Steven Van Canneyt, Thomas Demeester + Show 2 more

Open Access

https://doi.org/10.1007/s11042-017-4348-z

Copy DOI

Abstract

As the market of globally available online news is large and still growing, there is a strong competition between online publishers in order to reach the largest possible audience. Therefore an intelligent online publishing strategy is of the highest importance to publishers. A prerequisite for being able to optimize any online strategy, is to have trustworthy predictions of how popular new online content may become. This paper presents a novel methodology to model and predict the popularity of online news. We first introduce a new strategy and mathematical model to capture view patterns of online news. After a thorough analysis of such view patterns, we show that well-chosen base functions lead to suitable models, and show how the influence of day versus night on the total view patterns can be taken into account to further increase the accuracy, without leading to more complex models. Second, we turn to the prediction of future popularity, given recently published content. By means of a new real-world dataset, we show that the combination of features related to content, meta-data, and the temporal behavior leads to significantly improved predictions, compared to existing approaches which only consider features based on the historical popularity of the considered articles. Whereas traditionally linear regression is used for the application under study, we show that the more expressive gradient tree boosting method proves beneficial for predicting news popularity.

Full Text