A new computationally efficient method to tune BERT networks – transfer learning

Zian Wang

doi:10.1088/1742-6596/2580/1/012012

Abstract

BERT is a pre-trained language model that achieves state-of-the-art performance on natural language processing (NLP) tasks. Once it was published, it quickly became one of the most popular models in the NLP field. The official recommended method for applying BERT to downstream tasks is fine-tuning. However, we argue that transfer learning is also a very practical approach to applying BERT, and this approach has its own advantages compared to fine-tuning. In this paper, we explore the advantages of the transfer learning approach through the method that uses transfer learning and fine-tuning approaches separately to apply BERT to the same eight GLUE benchmark tasks and compare the training time spent and model performance scores, such as accuracy or F1 score, of the two approaches. Finally, this paper finds that among all eight GLUE tasks and on small training sets, transfer learning saves 30% to 50% of the time compared to a fine-tuning approach to train the models on the data of downstream tasks and can achieve very similar performance. Besides, for the same amount of time, transfer learning can obtain higher performance scores than fine-tuning on larger training sets. In conclusion, on small training sets, transfer learning has a huge advantage in time consumption under the premise of approximate performance, and it also performs better on large training sets under the same length of time, compared to fine-tuning.

Full Text