Exploring graph representation strategies for text classification

Henrique Varella Ehrenfried,Vinicius Tikara Venturi Date,Eduardo Todt

doi:10.1080/09540091.2023.2289832

Abstract

Since 2005, the deep learning community has had access to input graphs to their models. So, the natural language processing (NLP) community started using this technique to process text. However, a challenge that the graph neural networks (GNN) may encounter is the sensibility to representation format. Since different graphs can represent the same text, the model’s performance may change depending on the representation used. Even though many practitioners have this intuition, only some works touch on this aspect of GNN. Therefore, we explore twelve different text representation strategies that build graphs from text and apply them to the same GNN to investigate how different graphs may affect the results. We divide these strategies into four groups: reading order, dependency-based, binary tree, and graph of words. From these groups, we created the binary tree group for this paper. Nevertheless, in our tests, we observed that the dependency-based representations tend to achieve better performance: The dependency-based methods allow us to stay competitive in five relevant datasets and beat the state-of-the-art in another dataset. These results suggest that performing representation tuning can be a valuable technique to improve a deep learning model.

Full Text