DeepLPC-MHANet: Multi-Head Self-Attention for Augmented Kalman Filter-Based Speech Enhancement

Sujan Kumar Roy,Kuldip K Paliwal,Aaron Nicolson

doi:10.1109/access.2021.3077281

Abstract

Current augmented Kalman filter (AKF)-based speech enhancement algorithms utilise a temporal convolutional network (TCN) to estimate the clean speech and noise linear prediction coefficient (LPC). However, the multi-head attention network MHANet) has demonstrated the ability to more efficiently model the long-term dependencies of noisy speech than TCNs. Motivated by this, we investigate the MHANet for LPC estimation. We aim to produce clean speech and noise LPC parameters with the least bias to date. With this, we also aim to produce higher quality and more intelligible enhanced speech than any current KF or AKF-based SEA. To this end, we investigate MHANet within the DeepLPC framework. DeepLPC is a deep learning framework for jointly estimating the clean speech and noise LPC power spectra. DeepLPC is selected as it exhibits significantly less bias than other frameworks, by avoiding the use of whitening filters and post-processing. DeepLPC-MHANet is evaluated on the NOIZEUS corpus using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC-MHANet is compared to five existing deep learning-based methods. Compared to other deep learning approaches, DeepLPC-MHANet produced clean speech LPC estimates with the least amount of bias. DeepLPC-MHANet-AKF also produced higher objective scores than any of the competing methods (with an improvement of 0.17 for CSIG, 0.15 for CBAK, 0.19 for COVL, 0.24 for PESQ, 3.70% for STOI, 1.03 dB for SegSNR, and 1.04 dB for SI-SDR over the next best method). The enhanced speech produced by DeepLPC-MHANet-AKF was also the most preferred amongst ten listeners. By producing LPC estimates with the least amount of bias to date, DeepLPC-MHANet enables the AKF to produce enhanced speech at a higher quality and intelligibility than any previous KF or AKF-based method.

Highlights

Speech corrupted by background noise can reduce the efficiency of communication between speaker and listener
We investigate if an attention-based network can produce clean speech and noise linear prediction coefficients (LPCs) estimates with less bias and obtain higher quality and intelligibility scores than current deep learning-based Kalman filter (KF) and augmented KF (AKF) speech enhancement algorithm (SEA)
It can be seen that for both real-world nonstationary and coloured noise conditions, the proposed method produced lower spectral distortion (SD) levels than DeepLPC-residual network (ResNet)-TCN [34]. This demonstrates that an attention-based network is able to produce clean speech LPC estimates with less bias

Summary

Introduction

Speech corrupted by background noise (or noisy speech) can reduce the efficiency of communication between speaker and listener. A speech enhancement algorithm (SEA) can be used to suppress the embedded background noise and increase the quality and intelligibility of noisy speech [1]. SEAs are useful in many applications where noisy speech is undesirable and unavoidable. Hearing aid devices, and speech recognition systems typically rely upon SEAs for robustness. The noisy speech y(n), at discrete-time sample n, is given by: y(n) = s(n) + v(n), (1). L − 1} is the frame index with L being the total number of frames, and n {0, 1, . N − 1} where N is the total number of samples within each frame. The frame index is omitted from the following AKF recursive equations

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

DeepLPC-MHANet: Multi-Head Self-Attention for Augmented Kalman Filter-Based Speech Enhancement

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

DeepLPC: A Deep Learning Approach to Augmented Kalman Filter-Based Single-Channel Speech Enhancement
Sujan Kumar Roy ... Aaron Nicolson
IEEE Access | VOL. 9
Sujan Kumar Roy, et. al.Sujan Kumar Roy ... Aaron Nicolson
01 Jan 2020
IEEE Access | VOL. 9

Kalman Filtering with Machine Learning Methods for Speech Enhancement

-

04 May 2021
04 May 2021

On supervised LPC estimation training targets for augmented Kalman filter-based speech enhancement
Sujan Kumar Roy ... Kuldip K Paliwal
Speech Communication | VOL. 142
Sujan Kumar Roy, et. al.Sujan Kumar Roy ... Kuldip K Paliwal
25 Jun 2022
Speech Communication | VOL. 142

Causal Convolutional Encoder Decoder-Based Augmented Kalman Filter for Speech Enhancement
Sujan Kumar Roy ... Kuldip K Paliwal
-
Sujan Kumar Roy, et. al.Sujan Kumar Roy ... Kuldip K Paliwal
14 Dec 2020
14 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DeepLPC-MHANet: Multi-Head Self-Attention for Augmented Kalman Filter-Based Speech Enhancement

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access