Abstract

With the growing applications of Graph Convolutional Networks (GCN), there is also an increasing demand for its efficient hardware acceleration. Compared with CNN tasks, GCN tasks have new challenges such as randomness, sparsity, and nonuniformity, which will lead to poor performance of previous AI accelerators. In this paper, we propose DyGA, a hardware-efficient GCN accelerator, which is featured by strategies of graph partitioning, customized storage policy, traffic-aware dynamic scheduling, and out-of-order execution. Synthesized and evaluated under TSMC 28-nm, the accelerator achieves an average throughput of over 95% of its peak performance with full utilization of hardware on representative graph data sets. Having a high area-efficiency with 0.217 GOPS/K-logic-gates and 8.06 GOPS/KB-PE-buffer, and thus an energy-efficiency of 384GOPS/W, the proposed accelerator outperforms previous state-of-the-art works in the sparse data processing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.