Abstract
In intelligent manufacturing technology, machine with speech separation ability can effectively improve the efficiency of human-computer interaction, which is conducive to the rapid development of intelligent manufacturing industry. In single-channel speech separation based on deep learning, the performance of time domain features is better than that of frequency domain features. However, the current methods based on time domain feature have poor robustness in real noise environment, and time domain feature has limitations on the performance of the separation model. Therefore, we propose a Time-and Frequency fusion based on multi-scale convolution model(Tff-MscNet), which integrates time domain features and frequency domain features to improve multidimensional information of data. In order to further improve the performance of separation network, we introduced multiscale convolution block to improve the feature extraction ability of the network. We compare with the Conv-TasNet baseline model and the latest time-frequency fusion speech separation baseline model in GRID speech dataset. Experiments show that the performance and robustness of the proposed method are improved greatly in the experimental environment with real noise.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.