Abstract

Abstract In recent years, more and more people are applying Convolutional Neural Networks to the study of sound signals. The main reason is the translational invariance of convolution in time and space. Thereby the diversity of the sound signal can be overcome. However, in terms of sound direction recognition, there are also problems such as a microphone matrix being too large, and feature selection. This paper proposes a sound direction recognition using a simulated human head with microphones at both ears. Theoretically, the two microphones cannot distinguish the front and rear directions. However, we use the original data of the two channels as the input of the convolutional neural network, and the resolution effect can reach more than 0.9. For comparison, we also chose the delay feature (GCC) for sound direction recognition. Finally, we also conducted experiments that used probability distributions to identify more directions.

Highlights

  • In the field of artificial intelligence, computer hearing [8] remains in an early stage of research than computer vision [10]

  • This paper proposes a sound direction recognition using a simulated human head with microphones at both ears

  • The delay estimation is built on the generalized correlation function based on periodic cross-spectral density and the generalized cross-correlation algorithm based on the cross-power spectrum. This algorithm was suggested by Knapp et al In order to get a more accurate time delay estimation, people focus on the design of the microphone array, such as linear arrays [6], circular arrays [15], distributed arrays [3], and non-coplanar arrays of any shape [1]

Read more

Summary

Introduction

In the field of artificial intelligence, computer hearing [8] remains in an early stage of research than computer vision [10]. If directional noise occurs in practical applications, and its energy is not too dissimilar from the sound source, it is judged that the sound source may use the noise source as the sound source according to the largest features in the correlation matrix This method needs to search the entire space to determine the reliable source, and the accuracy of the estimation is related to the degree of subdivision of the space, and the calculation is complicated. Extracted features mainly include time delay estimation feature, covariance matrix, and short-time power spectral density function spectrum. In [18], the delay feature is further extracted, The network selects convolutional neural networks and multilayer perceptrons for reliable source coordinate localization. These methods use a complex microphone matrix and do not achieve end-to-end sound direction recognition

Overview of the principle of acoustic scattering
The proposed approach
Data acquisition and preprocessing
Feature extraction
CNN architecture
Experimental preparation and data acquisition
Data set
Experimental results based on different data sizes
Analysis of recognition results of different features
Influence of different indoor environments on experimental resultsn
The effect of different data sizes on the recognition accuracy
The influence of network structure on the recognition accuracy
High resolution directional model based on probability distribution
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call