Box office prediction is of great significance for understanding investment risks, class construction, promotion and distribution, and theater scheduling. However, due to the insufficient selection of influencing factors of movie box office, the currently existing prediction model restricts the prediction accuracy. A total of 34 influencing factors in 11 categories, such as heat index, movie types, release date, creators, first-day box office, were selected to study the prediction technology of movie box office. The Word2vec algorithm is used to construct a feature thesaurus for nouns in movie domain; adjectives and verbs with emotional coloring are used to construct an emotional dictionary based on the movie domain; and the TF-IDF algorithm is integrated to calculate the emotional scores of movie comments. A prediction method based on comments and Multivariate Linear Regression (MLR) is designed to analyze the relationship between the influencing factors and the movie box office, which provides an important basis for the prediction of the total box office, and also provides a decision-making reference for the movie industry and the related management departments. Incorporating comments as feature values to improve the accuracy, a prediction model based on comments and Convolutional Neural Network (CNN) is constructed. The results show that the average prediction accuracy of the MLR without comments, Back-Propagation Neural Network (BPNN), and CNN is 63.4%, 68.3%, and 71.9%, respectively, and after integrating the comments, the average prediction accuracy of the MLR and CNN is improved by 16.1% and 11.8%, respectively, and the prediction accuracy is significantly improved.
Read full abstract