TSI-GNN: Extending Graph Neural Networks to Handle Missing Data in Temporal Settings.

David Gordon,David Gordon,Panayiotis Petousis,Davina Zamanzadeh,Henry Zheng,Alex A T Bui

doi:10.3389/fdata.2021.693869

Abstract

We present a novel approach for imputing missing data that incorporates temporal information into bipartite graphs through an extension of graph representation learning. Missing data is abundant in several domains, particularly when observations are made over time. Most imputation methods make strong assumptions about the distribution of the data. While novel methods may relax some assumptions, they may not consider temporality. Moreover, when such methods are extended to handle time, they may not generalize without retraining. We propose using a joint bipartite graph approach to incorporate temporal sequence information. Specifically, the observation nodes and edges with temporal information are used in message passing to learn node and edge embeddings and to inform the imputation task. Our proposed method, temporal setting imputation using graph neural networks (TSI-GNN), captures sequence information that can then be used within an aggregation function of a graph neural network. To the best of our knowledge, this is the first effort to use a joint bipartite graph approach that captures sequence information to handle missing data. We use several benchmark datasets to test the performance of our method against a variety of conditions, comparing to both classic and contemporary methods. We further provide insight to manage the size of the generated TSI-GNN model. Through our analysis we show that incorporating temporal information into a bipartite graph improves the representation at the 30% and 60% missing rate, specifically when using a nonlinear model for downstream prediction tasks in regularly sampled datasets and is competitive with existing temporal methods under different scenarios.

Highlights

Graph representation learning (GRL) aims to accurately encode structural information about graphbased data into lower-dimensional vector representations (Hamilton, 2020)
We introduce temporal setting imputation using graph neural networks (TSI-GNN), which extends graph representation learning to handle missing data in temporal settings
While we evaluate TSI-GNN using the modified GraphSAGE architecture from GRAPE (You et al, 2020), our approach is general to GNN-based approaches that use a bipartite graph representation

Summary

Introduction

Graph representation learning (GRL) aims to accurately encode structural information about graphbased data into lower-dimensional vector representations (Hamilton, 2020). There are two node embedding approaches: shallow embedding methods and more complex encoder-based models (i.e., graph neural networks, GNNs) (Hamilton, 2020). Shallow embedding methods, such as inner product and random walks, are inherently transductive meaning they can only generate embeddings for nodes. A key feature of GNNs is that they can use k-rounds of message passing (inspired by belief propagation), where messages are aggregated from neighborhoods and combined with the representation from the previous layer/iteration to provide an updated representation (Hamilton, 2020)

Methods

Results

Discussion

Conclusion