Abstract

With the growth of the movie industry, it is becoming increasingly important for the stakeholders to get an idea about the probable profit made by the movie in the box office. In fact, among movies produced between 2000 and 2010 in the United States, only 36% had box office revenues higher than their production budgets, which further highlights the importance of making the right investment decisions. To address this issue, different machine learning algorithms like Logistic Regression, Support Vector Machine (SVM) and Multi Layer Perceptron (MLP) are used in this study to predict the box office return of a movie based on the data available before the release of the movie. The models use 35 movie parameters from 3200 movies as inputs to predict the profit made by a movie and classify the success of a movie from “flop” to “blockbuster” based on the generated revenue. An analysis of different machine learning architectures is also presented in this research. Finally, a system is proposed that produces comparable results with existing researches in this field and it can predict the profit generated by a movie with a “one class away” accuracy of 85.31% without using any sales information.

Highlights

  • The movie industry is one of the first forms of industrialized mass-entertainment and has exhibited remarkable growth in the last few decades bringing about a huge revenue for its stakeholders

  • KNN classifier was used by Alsaffar and Omar (2015) for Malay movie reviews, a hybrid model consisting of Multi Layer Perceptron (MLP) and Naïve Bayes (NB) was proposed by Al-Batah et al (2018) for Arabic movie reviews, Support Vector Machine (SVM) applied genetics (GSVM) and KNN classifier was used by Mohamed et al (2018) for the “Cornell Movie Review Dataset” (2004)

  • Linear kernel does not increase the dimension of the feature vector

Read more

Summary

Introduction

The movie industry is one of the first forms of industrialized mass-entertainment and has exhibited remarkable growth in the last few decades bringing about a huge revenue for its stakeholders. Shim and Pourhomayoun (2017) tried to predict the opening weekend revenue with a Linear Regression model They collected Twitter data from 67 movies and they considered the features number of tweets, number of positive and negative tweets, presence of special characters known as ‘emojis’ in tweets, number of theatres, budget and the weather condition of the opening weekend of the movie. The features like the content rating of the film, the release country, the popularity of the film in social networking sites were not considered Along with this feature set, three different regression algorithms Linear, Polynomial and Support Vector Regression (SVR) were applied. We decided to analyze the effectiveness of different machine learning algorithms to solve the problem of movie revenue prediction based on pre-released movie metadata without any sales information because sales information is not available prior to the release of the film

Methodology
50 Bingo accuracy
80.42 One class away accuracy
40 Logistic regression
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call