Convolutional Neural Networks (CNNs) have demonstrated promising performance in many NLP tasks owing to their excellent local feature-extraction capability. Many previous works have made word-level 2D CNNs deeper to capture global representations of text. Three-dimensional CNNs perform excellently in CV tasks through spatiotemporal feature learning, though they are little utilized in text classification task. This paper proposes a simple, yet effective, approach for hierarchy feature learning using 3D CNN in text classification tasks, named Text3D. Text3D efficiently extracts rich information through text representations structured in three dimensions produced by pretrained language model BERT. Specifically, our Text3D utilizes word order, word embedding and hierarchy information of BERT encoder layers as features of three dimensions. The proposed model with 12 layers outperforms the baselines on four benchmark datasets for sentiment classification and topic categorization. Text3D with a different hierarchy of output from BERT layers demonstrates that the linguistic features from different layers have varied effects on text classification.
Read full abstract