The present work proposes a novel framework for emotion-driven procedural sound generation, termed SONEEG. The framework merges emotional recognition with dynamic sound synthesis to enhance user schooling in interactive digital environments. The framework uses physiological and emotional data to generate emotion-adaptive sound, leveraging datasets like DREAMER and EMOPIA. The primary innovation of this framework is the ability to capture emotions dynamically since we can map them onto a circumplex model of valence and arousal for precise classification. The framework adopts a Transformer-based architecture to synthesize associated sound sequences conditioned on the emotional information. In addition, the framework incorporates a procedural audio generation module employing machine learning approaches: granular and wavetable synthesis and physical modeling to generate adaptive and personalized soundscapes. A user study with 64 subjects evaluated the framework through subjective ratings of sound quality and emotional fidelity. Analysis revealed differences among samples in sound quality, with some samples getting consistently high scores and some getting mixed reviews. While the emotion recognition model reached 70.3% overall accuracy, it performed better at distinguishing between high-arousal emotions but struggled to distinguish between emotions of similar arousal. This framework can be utilized in different fields such as healthcare, education, entertainment, and marketing; real-time emotion recognition can be applied to deliver personalized adaptive experiences. This step includes acquiring multimodal emotion recognition in the future and utilizing physiological data to understand people's emotions better.
Read full abstract