Deterministic sampling in heterogeneous graph neural networks

Fatemeh Ansarizadeh,David B Tay,Dhananjay Thiruvady,Antonio Robles-Kelly

doi:10.1016/j.patrec.2023.05.022

Abstract

Graphs are typically used to model datasets where any given data point is correlated with only a small number of other data points in the set, i.e. localized correlations. In some datasets, the data points can be of different types, and this requires the use of heterogeneous graphs. Learning methods underpinned by graphs are used for analysis tasks such as node classification and link prediction. To exploit localized correlations in the learning process, sampling the neigbourhood of a candidate root node is typically required. The data from the sampled set of nodes can then be embedded and aggregated for use in an end-to-end neural network architecture. Previous approaches to sampling are stochastic in nature, e.g. random walk with restart. In this work, we propose a new approach to sampling that is deterministic in nature. The deterministic approach is based on the notion of node importance in relation to a root node. The factors that contribute to the importance are: (i) distance (number of edges) from root node; and (ii) centrality measure of the node. In this study, we adopt the Katz measure as the centrality measure. By devising an efficient sampling method together with node embedding and aggregation methods, we propose a Deterministic Heterogeneous Graph Neural Network (D-HetGNN). The application of D-HetGNN to three datasets is presented, and an extensive experimental evaluation demonstrates the superiority of the proposed sampling.

Full Text