Text classification by BERT-Capsules

Minghui Guo

doi:10.61173/wcg0nf17

Abstract

This paper presents a model that integrates a BERT encoder with a Capsule network, eliminating the traditional fully connected layer designed for downstream classification tasks in BERT in favor of a capsule layer. This capsule layer consists of three main modules: the representation module, the probability module, and the reconstruction module. It transforms the final hidden layer output of BERT into the final activation capsule probabilities to classify the text. By applying the model to sentiment analysis and text classification tasks, and comparing the test results with various BERT variants, the performance across all metrics was found to be superior. Observing the model’s handling of multiple entities and complex relationships, sentences with high ambiguity were extracted to observe the probability distribution of all capsules and compared with RNN-Capsule. It was found that the activation capsule probabilities for BERT-Capsule were significantly higher than the rest, and more pronounced than RNN-Capsule, indicating the model’s exceptional ability to process ambiguous information.

Full Text