Abstract

We develop a methodology to predict box office performance of a movie at the point of green-lighting, when only its script and estimated production budget are available. We extract three levels of textual features (genre and content, semantics, and bag-of-words) from scripts using screenwriting domain knowledge, human input, and natural language processing techniques. These textual variables define a distance metric across scripts, which is then used as an input for a kernel-based approach to assess box office performance. We show that our proposed methodology predicts box office revenues more accurately (29 percent lower mean squared error (MSE)) compared to benchmark methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call