Verse1-Chorus-Verse2 Structure: A Stacked Ensemble Approach for Enhanced Music Emotion Recognition

Love Jhoye Moreno Raboy,Attaphongse Taparugssanagorn

doi:10.3390/app14135761

Abstract

In this study, we present a novel approach for music emotion recognition that utilizes a stacked ensemble of models integrating audio and lyric features within a structured song framework. Our methodology employs a sequence of six specialized base models, each designed to capture critical features from distinct song segments: verse1, chorus, and verse2. These models are integrated into a meta-learner, resulting in superior predictive performance, achieving an accuracy of 96.25%. A basic stacked ensemble model was also used in this study to independently run the audio and lyric features for each song segment. The six-input stacked ensemble model surpasses the capabilities of models analyzing song parts in isolation. The pronounced enhancement underscores the importance of a bimodal approach in capturing the full spectrum of musical emotions. Furthermore, our research not only opens new avenues for studying musical emotions but also provides a foundational framework for future investigations into the complex emotional aspects of music.

Full Text