By integrating state-of-the-art auditory scene-analysis methods with artificial cognitive feedback, the current project challenges human performance in quality of experience tasks and auditory scene analysis. Human listeners are regarded as multi-modal agents that develop their concept of the world by exploratory interaction. The prominent goal of the project is to develop an intelligent, computational model of active auditory perception and experience in a multi-modal context. The resulting system framework will form a structural link from binaural perception to judgment and action, realized by interleaved signal-driven (bottom-up) and hypothesis-driven (top-down) feedback processing within an innovative expert-system architecture. A conceptual overview of the project framework is presented, and insight is given into the current state of research, focusing on CASA-related search-&-rescue (S&R) scenarios. In these scenarios, an autonomous robot is endowed with auditory/visual feature-analysis facilities that provide it with bottom-up information of its environment. Top-down evaluation of the extracted features then results in cognitive feedback loops that allow the machine to adapt to complex S&R scenarios and perform significantly better than would be possible by only employing feed-forward control mechanisms. [Work performed in the context of the EU project Two!Ears.]