A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection

Seung-Jun Lee,Hyuk-Yoon Kwon

doi:10.3390/app10207385

Seung-Jun Lee, Hyuk-Yoon Kwon

Open Access

https://doi.org/10.3390/app10207385

Copy DOI

Abstract

In this paper, we propose a preprocessing strategy for denoising of speech data based on speech segment detection. A design of computationally efficient speech denoising is necessary to develop a scalable method for large-scale data sets. Furthermore, it becomes more important as the deep learning-based methods have been developed because they require significant costs while showing high performance in general. The basic idea of the proposed method is using the speech segment detection so as to exclude non-speech segments before denoising. The speech segmentation detection can exclude non-speech segments with a negligible cost, which will be removed in denoising process with a much higher cost, while maintaining the accuracy of denoising. First, we devise a framework to choose the best preprocessing method for denoising based on the speech segment detection for a target environment. For this, we speculate the environments for denoising using different levels of signal-to-noise ratio (SNR) and multiple evaluation metrics. The framework finds the best speech segment detection method tailored to a target environment according to the performance evaluation of speech segment detection methods. Next, we investigate the accuracy of the speech segment detection methods extensively. We conduct the performance evaluation of five speech segment detection methods with different levels of SNRs and evaluation metrics. Especially, we show that we can adjust the accuracy between the precision and recall of each method by controlling a parameter. Finally, we incorporate the best speech segment detection method for a target environment into a denoising process. Through extensive experiments, we show that the accuracy of the proposed scheme is comparable to or even better than that of Wavenet-based denoising, which is one of recent advanced denoising methods based on deep neural networks, in terms of multiple evaluation metrics of denoising, i.e., SNR, STOI, and PESQ, while it can reduce the denoising time of the Wavenet-based denoising by approximately 40–50% according to the used speech segment detection method.

Highlights

Denoising is the process of extracting only the clean speech from a mixed sound of speech and noise
We show that the accuracy of the proposed preprocessing strategy is comparable to or even better than that of the original Wavenet-based denoising in terms of multiple evaluation metrics of denoising, i.e., signal-to-noise ratio (SNR), STOI, and PESQ, while it can reduce the denoising time of the Wavenet-based denoising by 40.06–50.76% according to the used speech segment detection method
We determined a threshold for each speech segment detection method, which was a parameter that affected on the accuracy of each method as presented in Section 3, that showed the highest F1-score under the condition that the recall was greater than the precision so as to reduce filtering of speech segments

Summary

Introduction

Denoising is the process of extracting only the clean speech from a mixed sound of speech and noise. The main goal of denoising is to enhance the perceptual quality of speech and the robust speech recognition. Applications of denoising include cellular and teleconference communications affected by background and channel noise [1]. The denoising performance has a considerable impact on both the comprehensibility and the post-processing efficiency of the speech data. Various denoising methods have been studied [2]. As shown, we indicate that denoising, i.e., mitigating the noise

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Oct 21, 2020
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

A Multi-metric Selection Strategy for Evolutionary Symbolic Regression
Hu Zhang ... Aimin Zhou
-
Hu Zhang, et. al.Hu Zhang ... Aimin Zhou
11 Oct 2020
11 Oct 2020

Speech segment detection and word recognition
Tetsuya Muroi
The Journal of the Acoustical Society of America | VOL. 112
Tetsuya MuroiTetsuya Muroi
01 Jan 2002
The Journal of the Acoustical Society of America | VOL. 112

An Approach to Policy Gradient Reinforcement Learning with Multiple Evaluation Metrics
Yoshihiro Yasutake ... Sunao Sawada
-
Yoshihiro Yasutake, et. al.Yoshihiro Yasutake ... Sunao Sawada
01 Jun 2019
01 Jun 2019

SN Ratio Estimation and Speech Segment Detection of Extracted Signals Through Independent Component Analysis
Takeshi Koya ... Takaaki Ishibashi
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 14
Takeshi Koya, et. al.Takeshi Koya ... Takaaki Ishibashi
20 May 2010
Journal of Advanced Computational Intelligence and Intelligent Informatics | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences