Abstract

This study intends to estimate the profit of a movie through the construction of a predictive model that uses several Data Mining techniques, namely neural networks, regression and decision trees. The model will allow obtaining the prediction of box office revenue. Three different dependent variable approaches were used (interval, categorical and binary) aiming to study the difference and predictive influence that each one has on the results. Two metrics were used to determine the most accurate predictions: the misclassification error for the categorical models and the average squared error for the continuous one. In this study, the best predictive results were obtained through the use of multi-layer perceptron. Regarding the representative distinction between the dependent variable, the multiclass model presents a much higher error rate comparing to the remaining ones, which is explained with the increase of the number of classes to predict.

Highlights

  • The losses of an only slightly lucrative movie can contribute to the partial or even total downfall of the financial status of a movie studio

  • We propose to model the movie profitability and compare the results using three different dependent variables: 1) an interval variable with the value of profitability for each movie; 2) a categorical variable with the profitability values transformed into 9 classes and; 3) a binary variable indicating if the movie resulted in profit or deficit

  • The methodology used in this project is called SEMMA (Sample, Explore, Modify, Model, Assess), which is a Data Mining process developed by the SAS Institute (REF to SEMMA)

Read more

Summary

Introduction

The losses of an only slightly lucrative movie can contribute to the partial or even total downfall of the financial status of a movie studio. Data regarding 2012 allow to establishing a clearer vision of the importance of the profitability of movie production, assuming that only 10% of the released movies were responsible for more than 68.8% of the total box office revenue of that year (Ghiassi et al, 2015). This field is characterized by being one of the riskier to investors due to unpredictability. Creating three different predictive models will allow us to understand better the phenomena and improve the accuracy

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call