Abstract

Movie genre classification is an active research area in machine learning; however, the content of movies can vary widely within a single genre label. We expand these ‘coarse’ genre labels by identifying ‘fine-grained’ contextual relationships within the multi-modal content of videos. By leveraging pre-trained ‘expert’ networks, we learn the influence of different combinations of modes for multi-label genre classification. Then, we continue to fine-tune this ‘coarse’ genre classification network self-supervised to sub-divide the genres based on the multi-modal content of the videos. Our approach is demonstrated on a new multi-moda137,866,450 frame, 8,800 movie trailer dataset, MMX-Trailer-20, which includes pre-computed audio, location, motion, and image embeddings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call