Abstract

For speech enhancement, contextual information is important for accurate speech spectrum estimation. Conventional convolution layers are generally leveraged to mine implicit correlations from the adjacent area. But non-local information cannot be well captured such as correlations between the pitch and its overtones or full-band noise by fixed convolution. To capture superior dependency along temporal and frequency dimensions, we introduce a multi-scale informative perceptual network (MIPNet) to probe into feature extraction by incorporating localized patterns and global correlations for monaural speech enhancement. MIPNet is based on the encoder-decoder composed of multi-scale perceptual modules (MPMs) to extract preferable local patterns, which have two branches with dilated convolution and stacked fully convolutional layers. MPM is designed with long-term contexts sensitivity to detect the multi-scale adjacent information, thus it helps to rectify informative features and improve the efficiency and accuracy of feature coding. Besides, non-local modules are applied as bottleneck layers to obtain global informative flow. Incorporating MPMs and non-local modules, our proposed network can aggregate multi-scale contextual information, which can model preferable implicit acoustic features and eliminate the noise components. On Voice Bank + DEMAND dataset, MIPNet obtains 14.34% improvement in SSNR for its superiority in noise suppression. Experimental results on WSJ0, TIMIT demonstrate that the proposed model with a few parameters exhibits strong robustness and good performance in terms of objective speech intelligibility and quality under various noise conditions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.