Abstract

Multi-branch feature aggregation has recently been introduced and shows superior performance for speaker verification. It is often implemented via simple operations, such as element-wise addition or concatenation, but this might lead to suboptimal results. In this paper, we propose a novel multi-branch feature aggregation method based on multiple weighting (MBFA-MW), which adaptively learns attention weights for each branch to extract discriminative information that is beneficial to speaker verification. This method contains two weighting strategies, point attention and channel attention. Point attention learns a point-wise weight to emphasize salient local information from the time–frequency domain, and channel attention learns a channel-wise weight to enhance the correlation between the key information and the channel from the frequency domain. Combining the time–frequency domain and the frequency domain, the two strategies complement each other and extract informational features from multiple branch. In addition, we compared different multi-branch feature aggregation methods in the same environment. Experimental results on the datasets of Voxceleb and Cnceleb show the proposed method achieves performance improvements compared to other multi-branch feature aggregation methods and other mainstream methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call