Abstract
Deep learning methods have been widely applied to English text classification tasks in recent years, achieving strong performance. However, current methods face two significant challenges: (1) they struggle to effectively capture long-range contextual structure information within text sequences, and (2) they do not adequately integrate linguistic knowledge into representations for enhancing the performance of classifiers. To this end, a novel multilingual pre-training based multi-feature fusion method is proposed for English text classification (MFFMP-ETC). Specifically, MFFMP-ETC consists of the multilingual feature extraction, the multilevel structure learning, and the multi-view representation fusion. MFFMP-ETC utilizes the Multilingual BERT as deep semantic extractor to introduce language information into representation learning, which significantly endows text representations with robustness. Then, MFFMP-ETC integrates Bi-LSTM and TextCNN into multilingual pre-training architecture to capture global and local structure information of English texts, via modelling bidirectional contextual semantic dependencies and multi-granularity local semantic dependencies. Meanwhile, MFFMP-ETC devises the multi-view representation fusion within the invariant semantic learning of representations to aggregate consistent and complementary information among views. MFFMP-ETC synergistically integrates Multilingual BERT?s deep semantic features, Bi-LSTM?s bidirectional context processing, and TextCNN local feature extraction, offering a more comprehensive and effective solution for capturing long-distance dependencies and nuanced contextual information in text classification. Finally, results on three datasets show MFFMP-ETC conducts a new baseline in terms of accuracy, sensitivity, and precision, verifying progressiveness and effectiveness of MFFMP-ETC in the text classification.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have