Microphone arrays have been used to separate sound sources to improve speech recognition in a noisy environment. We propose a method using image signal processing to achieve highly accurate sound source separation. The microphone array is 1-m long and consists of eight microphones. Temporal sequences of the sound pressures obtained from the eight microphones are converted into sequences of luminance. These are arranged in parallel in an image, which is referred to as aspatio-temporal sound pressure distribution image. Sparse modeling using L1 regularization (Lasso) is applied to the image for restoring a high-resolution image. The spatial spectrum of the restored image is obtained using a two-dimensional fast Fourier transform (FFT) algorithm. In this spectrum, the angle of a line through the center denotes the arrival direction of sound, and the distance from the center indicates its frequency. By extracting a line from the spectrum, the sound source can be separated. A computational experiment revealed that high-resolution sound field obtained by a 512-microphone array could be restored using the proposed method. Moreover, SNR was improved by 32.1 dB when two sounds arrived 45° apart, indicating sufficient performance to extract a desired sound. [Work supported by JSPS KAKENHI Grant (16K06384).]
Read full abstract