Speech enhancement using group complementary joint sparse representations in modulation domain

Zhuopeng Xie,Huichao Yang,Zhongfu Ye

doi:10.1016/j.apacoust.2022.109081

Abstract

The internal group structure of signals has been considered for some speech enhancement (SE) algorithms, but most of them are conducted in acoustic domain. In this paper, we propose to incorporate the group structure in modulation domain as prior information for complementary joint sparse representations (CJSR). The modulation transform is applied to generate a set of sub-band amplitude spectrums with different modulation frequencies, which contain the novel time–frequency (TF) distributions different from that in acoustic domain. For each of these spectrums, we learn a couple of joint dictionaries in which the atoms are clustered in groups. The resulted dictionaries have structured characteristics of speech and noise. To represent a signal, we use an objective function based on sparse group lasso to activate atoms on group level. By doing so, the speech is robustly recovered from mixture according to preset group pattern. The results of ablation study show that each part of proposed method, that is, modulation-domain processing and group sparsity, has its benefits for CJSR and combining both parts leads to a further performance improvement. In the final comparative experiment, the results show that the proposed method produces better objective speech quality, improving PESQ by 6.0% and segSNR by 12.7% compared with baseline method.

Full Text