With the inherent algorithmic error resilience of conventional neural networks (CNN) and the worst-case design methodologies of current electronic design automation tools, overclocking-based timing speculation is a promising technique to improve the performance of CNN accelerators on FPGA by removing unnecessary timing margins. To avoid potential timing errors, timing delay measurement should be used during overclocking. However, current approaches are not yet good at measuring paths with more intense variability factors such as jitter, and lack an automated process for testing circuit delays. In this paper, we first propose 2-dimension multi-frame fusion to deal with the sampling jitter, then present a timing delay measurement-based automatic overclocking system (AOS) running on heterogeneous FPGA for high-performance CNN accelerators. On the FPGA side, AOS is composed of timing delay monitors (TDM) that can measure all types of timing paths, a TDM controller that converts the sampled values of TDMs into timing delay in terms of the ratio of path delay to the clock period. On the CPU side, AOS converts the path delay from clock period ratio to absolute delay value and decides the frequency of the accelerator in the next iteration. We demonstrate AOS with a SkyNet accelerator on the Xilinx ZCU104 board and achieve 657FPS at 436MHz without accuracy degradation, which is 1.41× performance compared to the baseline.
Read full abstract