Can bi-encoders, without additional fine-tuning, achieve a performance comparable to fine-tuned BERT models in classification tasks? To answer this question, we present a simple yet effective approach to text classification using bi-encoders without the need for fine-tuning. Our main observation is that state-of-the-art bi-encoders exhibit varying performance across different datasets. Therefore, our proposed approaches involve preparing multiple bi-encoders and, when a new dataset is provided, selecting and ensembling the most appropriate ones based on the dataset. Experimental results show that, for text classification tasks on subsets of the AG News, SMS Spam Collection, Stanford Sentiment Treebank v2, and TREC Question Classification datasets, the proposed approaches achieve performance comparable to fine-tuned BERT-Base, DistilBERT-Base, ALBERT-Base, and RoBERTa-Base. For instance, using the well-known bi-encoder model all-MiniLM-L12-v2 without additional optimization resulted in an average accuracy of 77.84%. This improved to 89.49% through the application of the proposed adaptive selection and ensemble techniques, and further increased to 91.96% when combined with the RoBERTa-Base model. We believe that this approach will be particularly useful in fields such as K-12 AI programming education, where pre-trained models are applied to small datasets without fine-tuning.
Read full abstract