Better Few-Shot Text Classification with Pre-trained Language Model

Zheng Chen,Yunchen Zhang

doi:10.1007/978-3-030-86340-1_43

Abstract

Recently, pre-trained language models achieve extraordinary performance on numerous benchmarks. By learning the general language knowledge from a large pre-train corpus, the language models could fit for a specific downstream task with a relatively small amount of labeled training data in the fine-tuning stage. More remarkably, the GPT-3 with 175 B parameters performs well in specific tasks by leveraging natural-language prompts and few demonstrations of the task. Inspired by the success of GPT-3, we desire to know whether smaller language models could still have a similarly few-shot learning ability. Unlike the various delicately designed tasks in previous few-shot learning research works, we do it more practically. We present a question-answering-based method to help the language model better understand the text classification task by concatenating a label-related question to each candidate sentence. By leveraging the label-related language knowledge, which the language model has learned during the pre-trained stage, our QA model can outperform the traditional binary and multi-class classification approaches over both English and Chinese datasets. Afterward, we test our QA model by performing few-shot learning experiments on multiple pre-trained language models of different sizes that range from the DistilBERT to the RoBERTa-large. We are surprised to find that even the DistilBERT, which is the smallest language model we tested with only 66 M parameters, still holds undeniable few-shot learning ability. Moreover, the RoBERTa-large with 355 M parameter could achieve a remarkable high accuracy rate of 92.18% with only 100 labeled training data. This result gives people a practical guideline that when a new category of labeled data is needed, only as few as 100 data need to be labeled. Then cooperate with an appropriate pre-training model and classification algorithm, reliable classification results can be obtained. Even without any labeled training data, that is, under the zero-shot learning setup, the RoBERTa-large still achieves a solid accuracy rate of 84.84%. Our code is available at https://github.com/ZhangYunchenY/BetterFs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Better Few-Shot Text Classification with Pre-trained Language Model

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Leveraging pre-trained language models for mining microbiome-disease relationships
Nikitha Karkera ... Sucheendra K Palaniappan
BMC bioinformatics | VOL. 24
Nikitha Karkera, et. al.Nikitha Karkera ... Sucheendra K Palaniappan
19 Jul 2023
BMC bioinformatics | VOL. 24

On the Power of Pre-Trained Text Representations
Yu Meng ... Jiawei Han
-
Yu Meng, et. al.Yu Meng ... Jiawei Han
14 Aug 2021
14 Aug 2021

Neural Transfer Learning For Vietnamese Sentiment Analysis Using Pre-trained Contextual Language Models
An Pha Le ... Tran Vu Pham
-
An Pha Le, et. al.An Pha Le ... Tran Vu Pham
16 Dec 2021
16 Dec 2021

A Comparison of Pre-Trained Language Models for Multi-Class Text Classification in the Financial Domain
Yusuf Arslan ... Lisa Veiber
-
Yusuf Arslan, et. al.Yusuf Arslan ... Lisa Veiber
19 Apr 2021
19 Apr 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Better Few-Shot Text Classification with Pre-trained Language Model

Abstract

Talk to us

Similar Papers