Abstract

In this paper, we propose a new microphone array signal processing technique, which increases the number of microphones virtually by generating extra signal channels from real microphone signals. Microphone array signal processing methods such as speech enhancement are effective for improving the quality of various speech applications such as speech recognition and voice communication systems. However, the performance of speech enhancement and other signal processing methods depends on the number of microphones. Thus, special equipment such as a multichannel A/D converter or a microphone array is needed to achieve high processing performance. Therefore, our aim was to establish a technique for improving the performance of array signal processing with a small number of microphones and, in particular, to increase the number of channels virtually by synthesizing virtual microphone signals, or extra signal channels, from two channels of microphone signals. Each virtual microphone signal is generated by interpolating a short-time Fourier transform (STFT) representation of the microphone signals. The phase and amplitude of the signal are interpolated individually. The phase is linearly interpolated on the basis of a sound propagation model, and the amplitude is nonlinearly interpolated on the basis of β divergence. We also performed speech enhancement experiments using a maximum signal-to-noise ratio (SNR) beamformer equipped with virtual microphones and evaluated the improvement in performance upon introducing virtual microphones.

Highlights

  • Speech processing applications, such as voice communication and speech recognition systems, have become more common in recent years

  • 6 Conclusions In this paper, we proposed a new array signal processing technique involving the virtual microphones to increase the number of channels virtually and to improve the performance of speech enhancement

  • Virtual microphone signals are generated by the interpolation of the phase and amplitude of a complex signal

Read more

Summary

Introduction

Speech processing applications, such as voice communication and speech recognition systems, have become more common in recent years. One typical speech enhancement approach is microphone array signal processing, which uses spatial information obtained with multiple microphones [1]. Underdetermined blind source separation (BSS) for a mixture of sources whose number exceeds the number of microphones has been widely studied as a typical framework for array signal processing with a small number of microphones [2]. In this problem, conventional linear array signal processing is ineffective because nontarget sources can only be canceled accurately by linear processing when there are fewer sources than microphones. The use of time-frequency masks leads to too much discontinuous zero padding of the extracted signals, and they tend to contain musical noise, which is undesirable for audio applications

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call