Music recommendation based on affective image content analysis

Rushabh Chheda,Dhruv Bohara,Rishikesh Shetty,Siddharth Trivedi,Ruhina Karani

doi:10.1016/j.procs.2023.01.021

Rushabh Chheda, Dhruv Bohara + Show 3 more

Open Access

https://doi.org/10.1016/j.procs.2023.01.021

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2023
Citations: 4	License type: cc-by-nc-nd

Affiliation: University of Mumbai

Abstract

Music has the ability to invest even the tritest scenes with so much meaning when added to them. Human perceptions of music and image can be closely related to each other, as both can incite similar sensations and emotions. Advertising agencies often make use of audio and music over their visuals to engage more audiences and to convey the emotions associated with their content more effectively. Matching visuals and music to comparable feelings might help people perceive emotions more vividly and strongly. This paper proposes an effective cross-modal neural network that provides music recommendations to a user by generating matches between images and music over a common emotional vector space. Using the valence and arousal values, a combined image-music pair dataset has been created. The images incorporated in this dataset are leveraged from the OASIS dataset while the music part is queried using Spotify API and YouTube. A Transfer Learning approach is proposed with Convolution Neural Network architecture for training on this dataset using MobileNetV3, ResNet-18 and EfficientNetB4 for the images and SampleCNN for the raw audio clips. For any given image input, a list of top-n music recommendations shall be outputted. This concept thus aims to generate music and image matching based on various deep hidden features over the emotion space of the two modalities.

Full Text