Abstract

Most on-device and cloud processing based automatic speech recognition (ASR) systems had poor recognition performance due to the noisy speech signals corrupted by various kinds of background noises such as vehicle, train, aircraft, fan, wind, rain, air-conditioner, and machinery noises which are unavoidable realistic scenarios. In this paper, we propose a novel speech signal quality assessment (SSQA) method for automatically assessing the quality of a recorded speech signal before processing on-device and sending the recorded data to the cloud server. The proposed method is based on the spectrogram feature and two-dimensional convolutional neural networks (2D-CNNs). The proposed SSQA method is evaluated using a large scale of noise-free speech and noisy speech signals which are corrupted with various kinds of noises with different noise levels. Results show that the 2D-CNN based method had an average Se=90.92%, Sp=98.44% and OA =96.44%. The method had better results in detecting the noisy speech segments. Results showed that there is confusion in performing the manual labelling of noise-free and noisy speech segments. Therefore, the noise-free and noisy speech signals are given to the publicly available ASR system to obtain the corresponding text. Then the word error rate (WER) and character error rate (CER) metrics were used to know the level of noise wherein the ASR system fails to correctly recognize its text. In this way, the noise level is determined for each of the noises to label the recorded speech signal into acceptable and unacceptable speech segments. The proposed quality-aware ASR system has great potential in improving the lifetime of the battery of the portable ASR devices and reducing the bandwidth and speech recognition software utilization costs in the case of cloud processing based ASR system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call