Abstract
In scenes with noise and overlapping speakers, directionally extracting audio tracks corresponding to individual speakers is crucial for immersive and interactive spatial audio systems. Although neural networks have been successful in this task, existing steering approaches for adjusting the direction of neural speech extraction mainly target spatial audio directly collected by microphone arrays, while directional speech extraction with Ambisonics spatial audio is less well studied. Therefore, to encode the target directional information as input for the neural network, this paper proposes two Ambisonics directional features based on the spatial feature difference and beamforming principle: the relative harmonic difference and the directional signal enhancement ratio. Using the special property of Ambisonics' rotation transform, a rotary steering pre-processing is also proposed to align the target speaker's direction with a fixed reference by inversely rotating the sound field, thereby simplifying multi-directional extraction to fixed-directional extraction. Finally, we integrate these proposed approaches with the existing temporal-spectral-spatial filtering neural networks to establish a generalized framework for steerable speech extraction and conduct experiments on a simulated Ambisonics dataset containing multiple speakers and noise sources. The experiments show that the proposed approaches outperform existing conditional steering and can be applied to various existing neural network architectures.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.