A multimodal approach for multi-label movie genre classification

Rafael B Mangolin,Alceu S Britto,Valéria D Feltrim,Yandre M G Costa,Carlos N Silla,Diego Bertolini,Rodolfo M Pereira

doi:10.1007/s11042-020-10086-2

Abstract

Movie genre classification is a challenging task that has increasingly attracted the attention of researchers. The number of movie consumers interested in taking advantage of automatic movie genre classification is overgrowing, thanks to media streaming service providers’ popularization. In this paper, we addressed the multi-label classification of movie genres in a multimodal way. To this end, we created a dataset composed of trailer video clips, subtitles, synopses, and movie posters from 152,622 movie titles of the Movie Database (TMDb). Such a large dataset was carefully curated, organized, and made available as a contribution of this work. We labeled each movie of the dataset according to a set of eighteen genre labels. In the experimental evaluation performed in this paper, we computed different kinds of descriptors, such as Mel Frequency Cepstral Coefficients (MFCCs), Statistical Spectrum Descriptor (SSD), Local Binary Pattern (LBP) from spectrograms, Long-Short Term Memory (LSTM), and Convolutional Neural Networks (CNN). With these descriptors, we trained different monolithic classifiers using BinaryRelevance and ML-kNN techniques. Besides, we also explored the combination of classifiers/features using a late fusion strategy. The fusion of a LSTM trained on synopses and another LSTM trained on the movie subtitles provided our best results in F-Score (0.674) and AUC-PR (0.725) metrics. These results corroborate the existence of complementarity among classifiers trained on different sources of information in this field of application. As far as we know, this is the most comprehensive study developed in terms of diversity of multimedia sources of information to perform movie genre classification.

Full Text