The graph embedded topic model

Dingge Liang,Marco Corneli,Charles Bouveyron,Pierre Latouche

doi:10.1016/j.neucom.2023.126900

Abstract

Most of existing graph neural networks (GNNs) developed for the prevalent text-rich networks typically treat texts as node attributes. This kind of approach unavoidably results in the loss of important semantic structures and restricts the representational power of GNNs. In this work, we introduce a document similarity-based graph convolutional network (DS-GCN) encoder to combine graph topologies and document semantics for text-rich network representation. Then, a graph decoder, based on the latent position model, is used to reconstruct the graph while preserving the network topology. The document matrix is rebuilt by a document decoder, based on the embedded topic model, which considers both topic and word embeddings. By including a cluster membership variable for each node in the network, we thus develop an end-to-end clustering technique relying on a new deep probabilistic model called the graph embedded topic model (GETM). Numerical experiments on three simulated scenarios emphasize the ability of GETM in fusing the graph topology structure and the document embeddings, and highlight its node clustering performance. Moreover, an application on the Cora-enrich citation network is conducted to demonstrate the effectiveness and interest of GETM in practice.

Full Text