Abstract

Attention plays a vital role in helping us navigate our acoustic surroundings. It guides sensory processing to sift through the cacophony of sounds in everyday scenes and modulates the representation of targets sounds relative to distractors. While its conceptual role is well established, there are competing theories as to how attentional feedback operates in the brain and how its mechanistic underpinnings can be incorporated into computational systems. These interpretations differ in the manner in which attentional feedback operates as an information bottleneck to aid perception. One interpretation is that attention adapts the sensory mapping itself to encode only the target cues. An alternative interpretation is that attention behaves as a gain modulator that enhances the target cues after they are encoded. Further, the theory of temporal coherence states that attention seeks to bind temporally coherent features relative to anchor features as determined by prior knowledge of target objects. In this work, we study these competing theories within a deep-network framework for the task of music source separation. We show that these theories complement each other, and when employed together, yield state of the art performance in music source separation. We further show that systems with attentional mechanisms can be made to scale to mismatched conditions by retuning only the attentional modules with minimal data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.