Representation learning over dynamic graphs is critical for many real-world applications such as social network services and recommender systems. Temporal graph neural networks (T-GNNs) are powerful representation learning methods and have demonstrated remarkable effectiveness on continuous-time dynamic graphs. However, T-GNNs still suffer from high time complexity, which increases linearly with the number of timestamps and grows exponentially with the model depth, making them not scalable to large dynamic graphs. To address the limitations, we propose Orca , a novel framework that accelerates T-GNN training by caching and reusing intermediate embeddings. We design an optimal caching policy, named MRD , for the uniform cache replacement problem, where embeddings at different intermediate layers have identical dimensions and recomputation costs. MRD not only improves the efficiency of training T-GNNs by maximizing the number of cache hits but also reduces the approximation errors by avoiding keeping and reusing extremely stale embeddings. For the general cache replacement problem, where embeddings at different intermediate layers can have different dimensions and recomputation costs, we solve this NP-hard problem by presenting a novel two-stage framework with approximation guarantees on the achieved benefit of caching. Furthermore, we have developed profound theoretical analyses of the approximation errors introduced by reusing intermediate embeddings, providing a thorough understanding of the impact of our caching and reuse schemes on model outputs. We also offer rigorous convergence guarantees for model training, adding to the reliability and validity of our Orca framework. Extensive experiments have validated that Orca can obtain two orders of magnitude speedup over state-of-the-art T-GNNs while achieving higher precision on various dynamic graphs.
Read full abstract