Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution

Yuki Kubo,Hiroshi Saruwatari,Norihiro Takamune,Daichi Kitamura

doi:10.1109/taslp.2020.3003165

Abstract

In this article, we propose a new blind speech extraction (BSE) method that robustly extracts a directional speech from background diffuse noise by combining independent low-rank matrix analysis (ILRMA) and efficient rank-constrained spatial covariance matrix (SCM) estimation. To achieve more accurate BSE than ILRMA, which assumes each source to be a point source (rank-1 spatial model), the proposed method restores the lost spatial basis for the full-rank SCM of diffuse noise. We adopt the multivariate complex generalized Gaussian distribution (GGD) as the statistical generative model to express various types of observed signal. To estimate the model parameters for an arbitrary shape parameter of the multivariate GGD, we derive a new inequality for rank-constrained SCMs. Also, we propose new acceleration methods to accomplish much faster extraction than conventional blind source separation methods. In BSE experiments using simulated and real recorded data, we confirm that the proposed method achieves more accurate and faster speech extraction than conventional methods.

Highlights

B LIND source separation (BSS) [1] is a technique for separating an observed multichannel signal, which is a mixture of multiple sources, into each source without any prior information about the sources or the mixing system
Let us denote a multichannel observed signal that is obtained via a short-time Fourier transform (STFT) as xij =T ∈ CM, where i = 1, . . . , I, j = 1, . . . , J, and m = 1, . . . , M are the indices of the frequency bins, time frames, and microphones, respectively, and T denotes the transpose
On the basis of the above motivation, we propose the following new estimation method for the full-rank spatial covariance matrix (SCM) of diffuse noise: (a) the rank-1 SCM for the directional target speech and rank-(M −1) SCM for diffuse noise are estimated by independent low-rank matrix analysis (ILRMA) and fixed, (b) the lost spatial basis for diffuse noise is restored to estimate noise components in the direction of the target speech, and (c) a multichannel Wiener filter is applied to suppress the noise components remaining in the separated directional target speech

Summary

INTRODUCTION

B LIND source separation (BSS) [1] is a technique for separating an observed multichannel signal, which is a mixture of multiple sources, into each source without any prior information about the sources or the mixing system. These methods assume a rank-1 spatial model; the frequency-wise acoustic path of each source can be represented by a single time-invariant spatial basis, which is often called a steering vector. Under this assumption, the determined BSS problem reduces to the estimation of a demixing matrix for each frequency. KUBO et al.: BLIND SPEECH EXTRACTION BASED ON RANK-CONSTRAINED SPATIAL COVARIANCE MATRIX ESTIMATION diffuse noise can cancel the directional target speech in the BSS methods based on the rank-1 spatial model [17], resulting in the accurate estimation of a rank-(M −1) diffuse noise SCM, where M denotes the number of microphones.

Definitions

MNMF and FastMNMF

Motivation and Strategy

Model and Speech Extraction

Optimization Framework

Generic Inequality and Identity for Rank-Constrained SCM Estimation

MM-Algorithm-Based and ME-Algorithm-Based Update Rules

Motivation

Key Concept

Second-Stage Acceleration

Advantage of Proposed Accelerated Update Rules

Experimental Condition

Comparison Between MM and ME Algorithms

SDR and SCM Behavior Comparison Between Proposed and Conventional Methods

Computational Time Comparison

BSE EXPERIMENT ON REAL RECORDED DATA

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM transactions on audio, speech, and language processing	Publication Date: Jan 1, 2020
Citations: 41	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing

Lead the way for us

Similar Papers

Deficient-basis-complementary rank-constrained spatial covariance matrix estimation based on multivariate generalized Gaussian distribution for blind speech extraction
Yuto Kondo ... Yuki Kubo
EURASIP Journal on Advances in Signal Processing | VOL. 2022
Yuto Kondo, et. al.Yuto Kondo ... Yuki Kubo
22 Sep 2022
EURASIP Journal on Advances in Signal Processing | VOL. 2022

Evaluation of Multichannel Hearing Aid System by Rank-Constrained Spatial Covariance Matrix Estimation
Masakazu Une ... Shoji Makino
-
Masakazu Une, et. al.Masakazu Une ... Shoji Makino
01 Nov 2019
01 Nov 2019

Deficient Basis Estimation of Noise Spatial Covariance Matrix for Rank-Constrained Spatial Covariance Matrix Estimation Method in Blind Speech Extraction
Yuto Kondo ... Hiroshi Saruwatari
-
Yuto Kondo, et. al.Yuto Kondo ... Hiroshi Saruwatari
06 Jun 2021
06 Jun 2021

Efficient Full-Rank Spatial Covariance Estimation Using Independent Low-Rank Matrix Analysis for Blind Source Separation
Yuki Kubo ... Hiroshi Saruwatari
-
Yuki Kubo, et. al.Yuki Kubo ... Hiroshi Saruwatari
01 Sep 2019
01 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing