BVMHA: Text classification model with variable multihead hybrid attention based on BERT

Bo Peng,Kundong Han,Yuquan Ma,Zhe Zhang,Mengnan Ma,Tao Zhang

doi:10.3233/jifs-231368

Abstract

Text classification is an important tasks in natural language processing. Multilayer attention networks have achieved excellent performance in text classification tasks, but they also face challenges such as high temporal and spatial complexity levels and low-rank bottleneck problems. This paper incorporates spatial attention into a neural network architecture that utilizes fewer encoder layers. The proposed model aims to enhance the spatial information of semantic features while addressing the high temporal and spatial demands of traditional multilayer attention networks. This approach utilizes spatial attention to selectively weigh the relevance of the spatial locations in the input feature maps, thereby enabling the model to focus on the most informative regions while ignoring the less important regions. By incorporating spatial attention into a shallower encoder network, the proposed model achieves improved performance on spatially oriented tasks while reducing the computational overhead associated with deeper attention-based models. To alleviate the low-rank bottleneck problem of multihead attention, this paper proposes a variable multihead attention mechanism, which changes the number of attention heads in a layer-by-layer manner with the encoder, achieving a balance between expression power and computational efficiency. We use two Chinese text classification datasets and an English sentiment classification dataset to verify the effectiveness of the proposed model.

Full Text