Abstract

Predicting movie sales figures has been a topic of interest for research for decades since every year there are dozens of movies which surprise investors either in a good or bad way depending on how well the film performs at the box office compared to the initial expectations. There have been past studies reporting mixed results on using movie critics reviews as one of the sources of information for predicting the movie box office outcomes. Similarly using social media as a predictor of movie success has been a popular research topic. We analyze the Hollywood and Bollywood movies from three years, which belong to two different geo as well as cultural locations. We used Twitter for collecting the wisdom of the crowd features (4.3 billion tweets, 1.41 TB in compressed size) and used movie critics review scores from movie review aggregator sites Metacritic and SahiNahi for Hollywood and Bollywood movies respectively. In addition, we also used metadata about movies such as budget, runtime, etc. for the prediction task. Using three different machine learning algorithms, we investigated this problem as a regression problem to predict the movie opening weekend revenues. Compared to past studies which have performed their analysis on much smaller datasets, we performed our study on a total of 533 movies. In addition to \(r^2\), we measured the quality of our models using MAPE and we find out that a model (Random Forest) based on all the three features (Metadata, Critics, Twitter) gives the best results in our analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call