Abstract
To solve the problems such as low accuracy and low retrieval performance in feature extraction of environmental sound data from Internet consumer finance scenario, a novel method for environmental sound filtering and feature extraction (VF-EFENet) is proposed. First, the Conv-TasNet speech separation model is clipped and migrated to filter foreground voice. Second, an environmental sound feature extraction model is established based on the improved VGGish, and pretraining weight is used to improve the feature extraction accuracy. Finally, metric learning is used to optimize the distance function to improve retrieval accuracy. Metric learning can make the same kind of audio feature space cohesive and the different types of audio feature space away. The experiments are implemented based on AISHELL-1 and ESC-50 data sets to test voice filter performance, average classification accuracy and average retrieval accuracy. The experimental results show that VF-EFENet can effectively filter the voice in mixed audio and the SI-SNR reaches 12.51 db. When sampling rate is 8 kHz, the average classification accuracy is improved by 8.3% after voice filtering using VF-EFENet. When Top30 samples are retrieved, the average retrieval accuracy of VF-EFENet is 7.37% higher than that of ESResNetAttention.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.