Abstract
The unseen noise signal is difficult to anticipate, and various approaches have been developed to address this issue. In our earlier work, we proposed a lightweight dynamic filter by splitting the filter into kernel and spatial parts. This small footprint model showed robust results in an unseen noisy environment. However, a simple pooling process for dividing the feature would limit the performance. In this paper, we propose an efficient dynamic filter to enhance the performance of the existing dynamic filter. Instead of the simple feature mean, we separate the input features as non-overlapping chunks, and separable convolutions take place for each feature direction. We also propose a dynamic filter based attention pooling method. These methods are applied to the kernel part in our previous work, and experiments are carried out for keyword spotting and speaker verification. We confirm that our proposed method performs better in unseen environments than the recently developed models.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have