Abstract

Inspired by convolutional neural networks, graph convolutional networks (GCNs) have been proposed for processing non-Euclidean graph data and successfully been applied in recommendation systems, smart traffic, etc. However, subject to the sparsity and irregularity of GCN models, the complex execution pattern of large-scale GCN poses huge challenges to the efficient inference on general purpose CPUs and GPUs, such as workload imbalance and irregular memory access. Therefore, we propose a software-hardware co-design framework for low-latency GCN inference on field programmable gate array. Specifically, at the algorithm level, we propose an attention-mechanism-based graph sparsification approach to reduce the redundant relation in the graph structure and alleviate irregularity without losing accuracy. Then, at the hardware design level, based on the sparsified graph, we propose a two-stage hardware architecture that supports the two phases with a distinct execution mode in the GCN. In order to achieve low-latency computation, edge-level and feature-level parallelism are exploited in the aggregation phase. In addition, a graph partition strategy is exploited to efficiently improve data reuse. The experimental results demonstrate that our proposed framework can achieve 739× speedup compared to CPU, 13.7× speedup compared to GPU on average and 6.8× speedup compared to state-of-the-art accelerators.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call