A proposed method to improve the WER of an ASR system in the noisy reverberant room

Mohammad Ebrahim Sadeghi,Hamid Sheikhzadeh,Mohammad Javad Emadi

doi:10.1016/j.jfranklin.2023.11.039

Abstract

This paper proposes a novel approach to reducing the word error rate (WER) of an automatic speech recognition (ASR) system in a noisy reverberant room. This research utilizes the integration of beamforming, dereverberation, and ambisonic. Based on the demonstrated formula, the proposed system synthesizes the signal of desired points on the sphere surface from a combination of 32 signals of a uniform spherical microphone array (USMA). This method uses the non-parametric sound field reproduction technique in the spherical harmonics domain (SHD). Also, the suggested new geometry determines the place of the desired points. In addition to improving the dereverberation performance, the proposed method also improves the performance of the beamformer in terms of directivity factor (DF) and white noise gain (WNG). The results show that objective metrics such as PESQ are significantly improved, and the WER of the Kaldi and the WeNet ASR systems is reduced considerably.

Full Text