Abstract
Spherical harmonic decomposition facilitates decomposing the sound pressure at different microphones into independent functions of frequency, azimuth and elevation of the source and microphone locations. This decomposition facilitates the extraction of two sets of features containing different information about elevation and azimuth of the source for the direction of arrival (DOA) estimation. These features can be given as input to a learning approach for the estimation of azimuth and elevation separately. This approach aims at breaking down the problem of DOA estimation into azimuth and elevation estimation separately. An advantage of this is the reduction in computational complexity when compared with the joint DOA estimation. This facilitates a straightforward extension of this approach to denser DOA search grids. The contribution of this paper is threefold. First, we propose spherical harmonic magnitude and phase features and discuss the information present in these features regarding the azimuth and elevation of the source. Second, we propose the convolutional neural network architectures for DOA estimation. Third, we analyse the training, run-time computational complexities and propose to extend the DOA estimation approach to dense DOA search grid rather than restricting to a sparse DOA search grid. The performance of conventional DOA estimation approaches degrades in case of a noisy and reverberant environment. Several advancements to the existing DOA estimation approaches have been recently proposed. However, to the best of the authors’ knowledge, learning approaches to DOA estimation with dense DOA search grids with few frames in the context of spherical arrays have not been proposed. Performance evaluation is carried out using simulated as well as real datasets. The proposed approach is also evaluated on LOCATA dataset in the context of a moving source. The results are motivating enough to consider the application of the proposed method in practical scenarios.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.