Abstract
The state of the art semi-supervised learning framework has greatly shown its potential in making deep and complex language models such as BERT highly effective for text classification tasks when labeled data is limited. However, the large size and low inference speed of such models may hinder their application on resources-limited or real-time use cases. In this paper, we propose a new approach in semi-supervised learning framework to distill large complex teacher model into a fairly lightweight student model which has the ability of acquiring knowledge from different layers of teacher with the usage of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula> -way projecting networks. Across four English datasets in text classification benchmarks and one dataset collected from an Chinese online course, our experiment shows that this student model achieves comparable results with the state of the art Transformer-based semi-supervised text classification methods, while using only 0.156MB parameters and having an inference speed 785 times faster than the teacher model.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have