Abstract

The performance of the existing speech enhancement algorithms is not ideal in low signal-to-noise ratio (SNR) non-stationary noise environments. In order to resolve this problem, a novel speech enhancement algorithm based on multi-feature and adaptive mask with deep learning is presented in this paper. First, we construct a new feature called multi-resolution auditory cepstral coefficient (MRACC). This feature which is extracted from four cochleagrams of different resolutions can capture the local information and spectrotemporal context and reduce the algorithm complexity. Second, an adaptive mask (AM) which can track noise change for speech enhancement is put forward. The AM can flexibly combine the advantages of an ideal binary mask (IBM) and an ideal ratio mask (IRM) with the change of SNR. Third, a deep neural network (DNN) architecture is used as a nonlinear function to estimate adaptive mask. And the first and second derivatives of MRACC and MRACC are used as the input of the DNN. Finally, the estimated AM is used to weight the noisy speech to achieve enhanced speech. Experimental results show that the proposed algorithm not only further improves speech quality and intelligibility, but also suppresses more noise than the contrast algorithms. In addition, the proposed algorithm has a lower complexity than the contrast algorithms.

Highlights

  • Over the past several decades, a large number of approaches were proposed to solve the problem of speech enhancement

  • A speech enhancement algorithm based on multi-resolution auditory cepstral coefficient (MRACC) and deep neural network (DNN) is proposed

  • In order to verify the effectiveness of the proposed algorithm, we select on training targets for supervised speech separation as the first contrast algorithm [20], and a feature study for classification-based speech separation at very low signal-to-noise ratio is considered as the second contrast algorithm [21]

Read more

Summary

Introduction

Over the past several decades, a large number of approaches were proposed to solve the problem of speech enhancement. In 2017, Li et al presented an IRM estimation using deep neural networks for monaural speech segregation in noisy reverberant conditions [24], Zhang et al presented a multi-target ensemble learning for monaural speech separation [25], and Sun et al proposed a multiple-target deep learning for LSTM-RNNbased speech enhancement [26]. In this algorithm, an IRM and a log-power spectral are regarded as the goal of training DNN.

Time-frequency decomposition
Feature extraction
Extraction of MRACC feature
Extraction of dynamic feature
Deep neural network model
Algorithm implementation steps
Experimental data
Subjective performance evaluation
Algorithm complexity test
Conclusion
Availability of data and materials
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call