Abstract
The neural mechanisms underpinning auditory scene analysis and object formation have been of intense research interest in the past two decades. Fundamentally, however, we live in a multisensory environment. Even Cherry in his original paper posited that “lip reading” as a way for us to solve the cocktail party problem. Yet, how different aspects of visual cues (e.g., timing, linguistic information) help listeners follow conversation in a complex acoustic scene is still not well understood. In this talk, we present a theoretical framework to study audiovisual scene analysis that has been extrapolated from the unisensory object-based attention literature and posit the following questions: How do we define a multi-modal object? What are the predictions from unisensory object-based attention theory when we apply to the audiovisual domain? What are the conceptual models to test the different neural mechanisms that underpin audiovisual scene analysis? Answering these questions would move us closer to addressing the cocktail party problem in the real-world setting as well as help us create, de novo, audiovisual scenes that are more engaging in the augmented/virtual reality world.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.