Abstract

Several architectures have been proposed for deep neural network (DNN)-based speech enhancement; however, these all utilize training targets related to the clean speech signal. In this paper, we evaluate the performance of several training targets in a noise-prediction DNN framework and compare the noise-prediction framework to a conventional speech-prediction network. Objective test results show that the mask-based targets are superior to the spectral magnitude target in the noise-prediction framework. The results also show that the best noise target outperforms the speech-prediction network in terms of objective quality and intelligibility metrics in seen noise conditions. The noise target is also competitive in unseen noise conditions, performing slightly worse in objective quality, but outperforming the speech-based target in objective intelligibility.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call