Abstract

In recent years, along with the dramatic developments of deep learning in the natural language processing (NLP) domain, notable multilingual pre-trained language techniques have been proposed. These recent multilingual text analysis and mining models have demonstrated state-of-the-art performance in several primitive NLP tasks, including cross-lingual text classification (CLC). However, these recent multilingual pre-trained language models still suffer limitations regarding their adaptation for specific task-driven fine-tuning in the context of low-resource languages. Moreover, they also encounter problems related to the capability of preserving the global semantic (e.g., topic, etc.) and long-range relationships between words to better fine-tune and effectively handle the cross-lingual text classification task. To meet these challenges, in this article, we propose a novel topic-driven multi-typed text graph attention–based representation learning method for dealing with the cross-lingual text classification problem called TG-CTC. In the proposed TG-CTC model, we utilize a novel fused topic-driven multi-typed text graph representation to jointly learn the rich-schematic structural and global semantic information of texts to effectively handle the CLC task. More specifically, we integrate the heterogeneous text graph attention network with the neural topic modelling approach to enrich the semantic information of learned textual representations in the context of multiple languages. Extensive experiments in benchmark multilingual datasets showed the effectiveness of the proposed TG-CTC model compared with the contemporary state-of-the-art baselines.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call