On-chip Block Memories Research Articles

Standard convolutional neural networks (CNNs) have large amounts of data redundancy, and the same accuracy can be obtained even in lower bit weights instead of floating-point representation. Most CNNs have to be developed and executed on high-end GPU-based workstations, for which it is hard to transplant the existing implementations onto portable edge FPGAs because of the limitation of on-chip block memory storage size and battery capacity. In this paper, we present adaptive pointwise convolution and 2D convolution joint network (AP2D-Net), an ultra-low power and relatively high throughput system combined with dynamic precision weights and activation. Our system has high performance, and we make a trade-off between accuracy and power efficiency by adopting unmanned aerial vehicle (UAV) object detection scenarios. We evaluate our system on the Zynq UltraScale+ MPSoC Ultra96 mobile FPGA platform. The target board can get the real-time speed of 30 fps under 5.6 W, and the FPGA on-chip power is only 0.6 W. The power efficiency of our system is 2.8× better than the best system design on a Jetson TX2 GPU and 1.9× better than the design on a PYNQ-Z1 SoC FPGA.

Read full abstract

A compact, fast, and accurate realization of a digital Gaussian variate generator (GVG) based on the Box-Muller algorithm is presented. The proposed GVG has a faster Gaussian sample generation rate and higher tail accuracy with a lower hardware cost than published designs. The GVG design can be readily configured to achieve arbitrary tail accuracy (i.e., with a proposed 16-bit datapath up to plusmn15 times the standard deviation sigma) with only small variations in hardware utilization, and without degrading the output sample rate. Polynomial curve fitting is utilized along with a hybrid (i.e., combination of logarithmic and uniform) segmentation and a scaling scheme to maintain accuracy. A typical instantiation of the proposed GVG occupies only 534 configurable slices, two on-chip block memories, and three dedicated multipliers of the Xilinx Virtex-II XC2V4000-6 field-programmable gate array (FPGA) and operates at 248 MHz, generating 496 million Gaussian variates (GVs) per second within a range of plusmn6.66sigma. To accurately achieve a range of plusmn9.4sigma, the GVG uses 852 configurable slices, three block memories, and three on-chip dedicated multipliers of the same FPGA while still operating at 248 MHz, generating 496 million GVs per second. The core area and performance of a GVG implemented in a 90-nm CMOS technology are also given. The statistical characteristics of the GVG are evaluated and confirmed using multiple standard statistical goodness-of-fit tests.

Read full abstract

On-chip Block Memories Research Articles

Related Topics

Articles published on On-chip Block Memories

Novel CNN-Based AP2D-Net Accelerator: An Area and Power Efficient Solution for Real-Time Applications on Mobile FPGA

Multiple Cell Upset Injection in BRAMs for Xilinx FPGAs

FPGA Implementation of Range Addressable Activation Function for Lattice-Ladder Neuron

A real-time sound rendering system based on the finite-difference time-domain algorithm

Hardware Implementation of Rayleigh and Ricean Variate Generators

A Compact and Accurate Gaussian Variate Generator

Apparatus and system for real-time synthetic focus ultrasonic imaging

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

On-chip Block Memories Research Articles

Related Topics

Articles published on On-chip Block Memories

Novel CNN-Based AP2D-Net Accelerator: An Area and Power Efficient Solution for Real-Time Applications on Mobile FPGA

Multiple Cell Upset Injection in BRAMs for Xilinx FPGAs

FPGA Implementation of Range Addressable Activation Function for Lattice-Ladder Neuron

A real-time sound rendering system based on the finite-difference time-domain algorithm

Hardware Implementation of Rayleigh and Ricean Variate Generators

A Compact and Accurate Gaussian Variate Generator

Apparatus and system for real-time synthetic focus ultrasonic imaging