Streaming services are increasingly leveraging Artificial Intelligence (AI) technologies for improved content cataloging, user experiences in content discovery, and personalization. A significant challenge in this domain is the automated assignment of microgenres to movies. This study introduces and evaluates approaches based on clustering, topic modeling, and word embedding to address this task. The evaluation employs a preprocessed dataset containing movie-related data—title tags, synopses, genres, and reviews—alongside a predefined microgenre list. Comparisons of three activation functions (binary step, ramp, and sigmoid) gauge their effectiveness in augmenting microgenre tags. Results demonstrate the superiority of the word embedding approach over clustering and topic modeling in terms of mean accuracy. Even more, the word embedding approach stands as the sole fully automated solution. Analysis indicates that incorporating review-based tags introduces noise and undermines accuracy. Besides, the word embedding approach yields optimal outcomes using the sigmoid function, effectively doubling assigned tags while maintaining matching quality. This sheds light on the potential of word embedding methods within the movie domain.
Read full abstract