Abstract

This paper proposes a novel approach to reducing the word error rate (WER) of an automatic speech recognition (ASR) system in a noisy reverberant room. This research utilizes the integration of beamforming, dereverberation, and ambisonic. Based on the demonstrated formula, the proposed system synthesizes the signal of desired points on the sphere surface from a combination of 32 signals of a uniform spherical microphone array (USMA). This method uses the non-parametric sound field reproduction technique in the spherical harmonics domain (SHD). Also, the suggested new geometry determines the place of the desired points. In addition to improving the dereverberation performance, the proposed method also improves the performance of the beamformer in terms of directivity factor (DF) and white noise gain (WNG). The results show that objective metrics such as PESQ are significantly improved, and the WER of the Kaldi and the WeNet ASR systems is reduced considerably.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call