Abstract

Recently, graph neural networks (GNNs) have achieved excellent performance on many graph-related tasks. Existing typical GNNs follow the neighborhood aggregation strategy, which updates nodes’ information by aggregating the feature of neighboring nodes. However, these hybrid execution patterns limit their deployment on resource-limited devices. Quantification is an effective technique for deep neural networks (DNNs) inference acceleration, but few studies have considered exploring suitable quantization algorithms for GNNs. In this paper, we propose a degree-based quantization (DBQ) that can identify sensitive nodes in the graph structure. The protective masks are used to ensure that sensitive nodes perform full-precision operations, and the remaining nodes are quantized. In this way, the effect of dynamically changing the precision is achieved to achieve greater acceleration while retaining better classification accuracy. To support DBQ and convert it into performance improvements, we design a new architecture. Elaborate pipelines and specialized optimizations effectively improve inference speed and accuracy. Compared to state-of-the-art GNN accelerators, DBQ gains on 2.4 × speedups and improves accuracy by 27.7%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call