Abstract

This work presents a resource-efficient solution for the spoken dialect recognition task under semi-open set evaluation scenarios, where a closed set model is exposed to unknown class inputs. We have primarily explored the task 2 of the OLR 2020 challenge for our experiments. In this task, three Chinese dialects Hokkien, Sichuanese, and Shanghainese, are to be recognized. For evaluation, along with the three target dialects, utterances from other unknown classes are also included. We find that the top-performing submissions and the baseline system did not propose solutions that explicitly address the semi-open set scenario. This work pays special attention to the semi-open set nature of the problem and analyzes how the unknown utterances can potentially degrade the overall performance if not treated separately. We train our main dialect classifier with the ECAPA-TDNN architecture and 40-dimensional MFCC from the training data of three dialects. We propose a confidence-assessment algorithm and combine the TDNN performance from both end-to-end and embedding extractor approaches. We then frame the semi-open set scenario as a constrained optimization problem. By solving it, we prove that the performance degradation by the unknown utterances is minimized if the corresponding softmax prediction is equally confused among the target outputs. Based on this criterion, we develop different feedback modules in our system. These modules work on the novelty detection principles and flag unknown class utterances as anomaly. The prediction score of the corresponding utterance is then penalized by flattening. The proposed system achieves Cavg(×100) score of 8.50 and EER (%) of 9.77. Averaging both metrics, the score for our system outperforms the winning submission. Due to the proposed semi-open set adaptations, our system achieves this performance using much less training data and computation resources than the top-performing submissions. Additionally, to verify the broader applicability of the proposed semi-open set solution, we experiment with two other dialect recognition tasks covering English and Arabic languages and larger database sizes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call