Abstract

Few-shot speaker recognition task is to identify speakers from limited support samples. We argue that query samples and support samples are both informative for classification. To help Prototypical Networks capture information from query samples, this paper proposes the relation-based indefinite distance metric attentive correction prototype network (RACP). Since the mean prototype deviates from the ideal prototype, we calculate attention scores for each query sample to customize the attention prototype. Then, to compensate for the missed query samples information, the prototype is further refined by correction data that is constructed by combining query samples with the global class attention score. Later, the indefinite distance metric of Relation Networks is introduced on Prototypical Networks, and the relation scores between the sample prototypes and the query samples are calculated for final prediction. Compare with existing methods, RACP can consider both query samples and support samples instead of ignoring the query ones. We compare RACP with strong baselines (e.g. GMM-SVM, MAML, Prototypical Networks, Res32, and VGG11). Ablation study and generalizability study of different scenarios are also conducted on different datasets. Results show that RACP achieves better performance and generalization ability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call