Abstract

In this paper, a novel speech enhancement method based on a hybrid machine-learning architecture consisting of U-Net and nonnegative matrix factorization (NMF) is proposed. The proposed method attempts to take advantage of both the accurate separation for known noise environments by U-Net and the adaptation to unseen noises by an NMF with an online dictionary learning technique. To merge the two different architectures, a modified U-Net with a temporal activation layer (TAU-Net) is jointly optimized with NMF models that represent universal speech and noise. The proposed method first estimates the temporal activations from the encoder of the proposed TAU-Net. Then, an NMF with online dictionary learning adjusts the initially given temporal activations to suppress their cross-activations due to unseen noises that are unknown in the training phase of TAU-Net. Finally, clean speech is obtained by adjusting temporal activations to the TAU-Net decoder. The effectiveness of the proposed TAU-Net-based speech enhancement method is evaluated in various unseen noise environments. Consequently, the proposed method achieves a substantial improvement with average signal-to-distortion ratios of 2.32 dB and 5.68 dB, which are higher than those of the baseline methods such asspeech enhancement generative adversarial network (SEGAN) and U-Net, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call