Abstract

The Internet’s popularization has increased the amount of content produced and consumed on the web. To take advantage of this new market, major content producers such as Netflix and Amazon Prime have emerged, focusing on video streaming services. However, despite the large number and diversity of videos made available by these content providers, few of them attract the attention of most users. For example, in the data explored in this article, only 6% of the most popular videos account for 85% of total views. Finding out in advance which videos will be popular is not trivial, especially given many influencing variables. Nevertheless, a tool with this ability would be of great value to help dimension network infrastructure and properly recommend new content to users. In this way, this manuscript examines the machine learning-based approaches that have been proposed to solve the prediction of web content popularity. To this end, we first survey the literature and elaborate a taxonomy that classifies models according to predictive features and describes state-of-the-art features and techniques used to solve this task. While analyzing previous works, we saw an opportunity to use textual features for video prediction. Thus, additionally, we propose a case study that combines features acquired through attribute engineering and word embedding to predict the popularity of a video. The first approach is based on predictive attributes defined by resource engineering. The second takes advantage of word embeddings from video descriptions and titles. We experimented with the proposed techniques in a set of videos from GloboPlay, the largest provider of video streaming services in Latin America. A combination of engineering features and embeddings using the Random Forest algorithm achieved the best result, with an accuracy of 87%.

Highlights

  • The Internet has become one of the primary means of communication and information in the world

  • We propose two approaches aiming at predicting video popularity from a streaming service

  • We investigate the predictive power of each classifier when they are induced from engineered features, word embeddings, and when both types of those features are at their disposal on a set of 9989 videos from GloboPlay’s streaming service

Read more

Summary

Introduction

The Internet has become one of the primary means of communication and information in the world. In 2012, 2 billion people had access to the Internet, representing 30% of the world population [1]. The number of Internet users has grown to 4.66 billion, representing 60% of the world population driven by the increase in the use of smartphones and other smart devices [2]. The challenges imposed by COVID-19 were responsible for almost 300 million people to access the Internet for the first time in the last year, according to the DataReportal [3] website. With the popularization of the Internet, streaming video services, such as YouTube, Netflix, GloboPlay, and Amazon Prime, have grown. In April 2021, Netflix had 208 million subscribers while Amazon Prime had 200 million subscribers [4] worldwide

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call