Abstract

In text classification task, models have shown remarkable accuracy across various datasets. However, confusion often arises when certain categories within the dataset are too similar, causing misclassification of certain samples. This paper proposes an improved method for this problem, through the creation of a three-layer text graph for the corpus, which is used to calculate the Category Correlation Matrix (CCM). Additionally, this paper introduces category-adaptive contrastive learning for text embedding from the encoder, enhancing the model’s ability to distinguish between samples in confusable categories that are easily confused. Soft labels are generated using this matrix to guide the classifier, preventing the model from becoming overconfident with one-hot vectors. The efficacy of this approach was demonstrated through experimental evaluations on three text encoders and six different datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call