Abstract. The global film industry has been proved to impose a significant impact on both culture and the economy, with box office revenue serving as a crucial indicator of a film's commercial success. This study utilizes data from the Kaggle "TMDB Box Office Prediction" competition, encompassing 3,000 films released between 1990 and 2018, to predict movie box office revenue using Random Forest, XGBoost algorithms, and deep learning models such as Bidirectional Long Short-Term Memory (Bidirectional LSTM) and Simple Recurrent Neural Network (SimpleRNN). The goal is to develop a model that accurately predicts movie box office. By comprehensively considering multiple factors such as budget, popularity and film characteristics, this study not only significantly improves the accuracy of box office prediction, but also provides a scientific basis for the formulation of film market strategies. The results demonstrate that the Bidirectional LSTM excels in handling complex time-series data, showing strong trend-capturing capabilities, while XGBoost exhibits greater robustness in dealing with complex data and outliers. These findings can not only provide guidance for making more effective strategies on film production and distribution, but also provide new directions for future research, such as delving into the impact of social media on box office and developing more sophisticated predictive models to adapt to changing market dynamics.
Read full abstract