Due to the subjectivity of emotions and the limited number of emotion categories, existing deep learning models require assistance to achieve objective, accurate, and flexible personalized music emotion recommendations. This paper introduces a deep learning approach that combines Generative Artificial Intelligence (GAI) and explicitly leverages physiological indicators to enhance the model's intelligence, versatility, and automation. Physiological indicators such as Heart Rate Variability (HRV) and Galvanic Skin Response (GSR) can be measured using sensors placed on the body's surface, providing more precise information about human emotional changes. This research employs a three-dimensional emotion model, including the tension-arousal axis, energy-arousal axis, and valence axis, to explain the correlation and accuracy between music data and emotions. Based on this, a music emotion classifier is designed, incorporating GAI algorithms to recommend music by matching users' physiological and emotional types with the emotional features of music. The classifier uses Mel-Frequency Cepstral Coefficients (MFCC) to transform audio into Mel-spectrogram as input features. The music emotion selection module adopts a GAI framework of Variational Autoencoder (VAE) and integrates multi-scale parallel convolution and attention mechanism modules. Experimental results demonstrate that this approach is competitive compared to existing deep learning architectures on PMEmo, RAVDESS, and Soundtrack datasets. Furthermore, due to GAI's efficient classification capability, this model is suitable for resource-constrained mobile devices and other smart devices. The results of this study can be applied to emotion-based music recommendation systems, contributing to emotional interventions and improving the performance of exercise and music therapy.
Read full abstract