Preciser comparison: Augmented multi-layer dynamic contrastive strategy for text2text question classification

Jiyao Wang,Zijie Chen,Yijia Zhang,Dengbo He,Fangzhen Lin

doi:10.1016/j.neucom.2023.126299

Abstract

Text2text question classification (TQC), as a particular application case of question classification (QC), is of great practical value. Traditional QC methods usually label categories of questions using one or multiple keywords provided by users. In contrast, in TQC, each question in natural language is automatically categorized into pre-designed standard question classes, which are coded in the form of short text. Because of this unique characteristic, TQC relies on a specifically designed framework and should be trained and validated based on customized experimental datasets. Previous TQC-related work mainly utilized textual similarity-matching methods. However, no effective pairwise learning paradigm has been proposed in TQC to model correlations between input text and classes; and the influence of distance metrics and loss function in TQC has not been investigated. In this work, we propose a novel and comprehensive strategy, Augmented Dynamic Multi-layer Contrastive (ADMC), to resolve the challenge of TQC. Our framework consists of (1) an optional data augmentation module, (2) one stage for dynamic negative sampling, and (3) one stage for precise matching. The comprehensive TQC framework with ADMC strategy in this work resolves data imbalance and explores distance metrics learning via multiple augmentation options and dynamic negative sampling based on multi-layer contrastive learning. To compensate for the shortage of public datasets for this task, we collected two real-world datasets and adaptively expanded three existing public datasets, which will be available after data masking. The results show that our ADMC outperformed other baseline methods investigated in this paper. The codes are available at https://github.com/WJULYW/ADMC.

Full Text