Multi-label unknown intent detection is a challenging task where each utterance may contain not only multiple known but also unknown intents. To tackle this challenge, pioneers proposed to predict the intent number of the utterance first, then compare it with the results of known intent matching to decide whether the utterence contains unknown intent(s). Though they have made remarkable progress on this task, their methods still suffer from two important issues: (1) It is inadequate to extract multiple intents using only utterance encoding; (2) Optimizing two sub-tasks (intent number prediction and known intent matching) independently leads to inconsistent predictions. In this article, we propose to incorporate segment augmentation rather than only use utterance encoding to better detect multiple intents. We also design a prediction consistency module to bridge the gap between the two sub-tasks. Empirical results on MultiWOZ2.3 and MixSNIPS datasets show that our method achieves state-of-the-art performance and significantly improves the best baseline.
Read full abstract