Abstract
For speech enhancement, contextual information is important for accurate speech spectrum estimation. Conventional convolution layers are generally leveraged to mine implicit correlations from the adjacent area. But non-local information cannot be well captured such as correlations between the pitch and its overtones or full-band noise by fixed convolution. To capture superior dependency along temporal and frequency dimensions, we introduce a multi-scale informative perceptual network (MIPNet) to probe into feature extraction by incorporating localized patterns and global correlations for monaural speech enhancement. MIPNet is based on the encoder-decoder composed of multi-scale perceptual modules (MPMs) to extract preferable local patterns, which have two branches with dilated convolution and stacked fully convolutional layers. MPM is designed with long-term contexts sensitivity to detect the multi-scale adjacent information, thus it helps to rectify informative features and improve the efficiency and accuracy of feature coding. Besides, non-local modules are applied as bottleneck layers to obtain global informative flow. Incorporating MPMs and non-local modules, our proposed network can aggregate multi-scale contextual information, which can model preferable implicit acoustic features and eliminate the noise components. On Voice Bank + DEMAND dataset, MIPNet obtains 14.34% improvement in SSNR for its superiority in noise suppression. Experimental results on WSJ0, TIMIT demonstrate that the proposed model with a few parameters exhibits strong robustness and good performance in terms of objective speech intelligibility and quality under various noise conditions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.