Abstract

Systolic array architecture is widely used in spatial hardware and well-suited for many tensor processing algorithms. Many systolic array architectures are implemented with high-level synthesis (HLS) design flow. However, existing HLS tools do not favor of modular and reusable design, which brings inefficiency for design iteration. In this article, we analyze the systolic array design space, and identify the common structures of different systolic dataflows. We build hardware module templates using Chisel infrastructure, which can be reused for different dataflows and computation algorithms. This remarkably improves the productivity for the development and optimization of systolic architecture. We further build a systolic array generator that transforms the tensor algorithm definition to a complete systolic hardware architecture. Experiments show that we can implement systolic array designs for different applications and dataflows with little engineering effort, and the performance throughput outperforms HLS designs.

Highlights

  • Systolic array architecture is widely used in spatial hardware and well-suited for many tensor processing algorithms

  • Experimental Setup We evaluate the performance and programming efficiency of our systolic generator with GEMM and other tensor applications, and compare the result of GEMM with several existing high-level synthesis (HLS)-based works.3;7;8 The systolic array designs are synthesized and implemented on Xilinx VU9P FPGA platform with Xilinx Vivado 2018.2

  • Our implementation uses Chisel’s ready-valid interface for data communication, which avoids the unified HLS programming interface that leads to extra data dependence, and the complex finite state machine generated by HLS compiler

Read more

Summary

Yun Liang Peking University

Abstract—Systolic array architecture is widely used in spatial hardware and well-suited for many tensor processing algorithms. We build hardware module templates using Chisel infrastructure, which can be reused for different dataflows and computation algorithms. This remarkably improves the productivity for the development and optimization of systolic architecture. Experiments show that we can implement systolic array designs for different applications and dataflows with little engineering effort, and the performance throughput outperforms HLS designs. & TENSOR ALGEBRA IS a prevalent tool of modern computer applications and is increasingly deployed onto various embedded devices. Systolic array architecture that features with high computation parallelism and data reusability using an array of processing elements (PEs) are widely adopted in accelerator designs. Systolic architectures are used in many other applications like convolution, FFT, and matrix decomposition

Published by the IEEE Computer Society
SYSTOLIC ARRAY ARCHITECTURE
GENERATING SYSTOLIC ARRAYS WITH REUSABLE COMPONENTS
Decoupled Generation of Controller and PE Structure
EVALUATION
Performance Comparison of Different Dataflow and Data Types
CONCLUSION
& REFERENCES
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call