Abstract

Graph attention networks (GATs) are a mainstream graph neural network (GNN) model. They have better performance in some tasks compared to other GNN models. The challenge is that graph data structures are irregular and data dependency in GATs is complex. General-purpose hardware cannot provide enough performance or energy efficiency. Therefore, a specialized accelerator for GATs is needed. In this brief, we propose FTW-GAT to accelerate GAT inference. The key idea of our approach is to quantize the weights of GATs to ternary values, which can greatly simplify processing elements (PE), eliminate the dependence on digital signal processors (DSPs) and reduce power consumption. Then, we use operation fusion, multi-level pipelining and graph partitioning to improve the parallelism. Finally, we implement the accelerator on a Xilinx VCU128 FPGA platform. The results show that FTW-GAT achieves performance speedup by 390×, 17× and 1.4×, and energy efficiency by 4007×, 261× and 3.1× compared to CPUs, GPUs and a prior GAT accelerator.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call