Abstract

Text2text question classification (TQC), as a particular application case of question classification (QC), is of great practical value. Traditional QC methods usually label categories of questions using one or multiple keywords provided by users. In contrast, in TQC, each question in natural language is automatically categorized into pre-designed standard question classes, which are coded in the form of short text. Because of this unique characteristic, TQC relies on a specifically designed framework and should be trained and validated based on customized experimental datasets. Previous TQC-related work mainly utilized textual similarity-matching methods. However, no effective pairwise learning paradigm has been proposed in TQC to model correlations between input text and classes; and the influence of distance metrics and loss function in TQC has not been investigated. In this work, we propose a novel and comprehensive strategy, Augmented Dynamic Multi-layer Contrastive (ADMC), to resolve the challenge of TQC. Our framework consists of (1) an optional data augmentation module, (2) one stage for dynamic negative sampling, and (3) one stage for precise matching. The comprehensive TQC framework with ADMC strategy in this work resolves data imbalance and explores distance metrics learning via multiple augmentation options and dynamic negative sampling based on multi-layer contrastive learning. To compensate for the shortage of public datasets for this task, we collected two real-world datasets and adaptively expanded three existing public datasets, which will be available after data masking. The results show that our ADMC outperformed other baseline methods investigated in this paper. The codes are available at https://github.com/WJULYW/ADMC.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call