Abstract

In recent years, machine learning approaches, in particular graph learning methods, have achieved great results in the field of natural language processing, in particular text classification tasks. However, many of such models have shown limited generalization on datasets in different languages. In this research, we investigate and elaborate graph machine learning methods on non-English datasets (such as the Persian Digikala dataset), which consists of users’ opinions for the task of text classification. More specifically, we investigate different combinations of (Pars) BERT with various graph neural network (GNN) architectures (such as GCN, GAT, and GIN) as well as use ensemble learning methods in order to tackle the text classification task on certain well-known non-English datasets. Our analysis and results demonstrate how applying GNN models helps in achieving good scores on the task of text classification by better capturing the topological information between textual data. Additionally, our experiments show how models employing language-specific pre-trained models (like ParsBERT, instead of BERT) capture better information about the data, resulting in better accuracies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call