Single-port SRAM Research Articles

Image processing and machine learning applications benefit tremendously from hardware acceleration. Existing compilers target either FPGAs, which sacrifice power and performance for programmability, or ASICs, which become obsolete as applications change. Programmable domain-specific accelerators, such as coarse-grained reconfigurable arrays (CGRAs), have emerged as a promising middle-ground, but they have traditionally been difficult compiler targets since they use a different memory abstraction. In contrast to CPUs and GPUs, the memory hierarchies of domain-specific accelerators use push memories : memories that send input data streams to computation kernels or to higher or lower levels in the memory hierarchy and store the resulting output data streams. To address the compilation challenge caused by push memories, we propose that the representation of these memories in the compiler be altered to directly represent them by combining storage with address generation and control logic in a single structure—a unified buffer. The unified buffer abstraction enables the compiler to separate generic push memory optimizations from the mapping to specific memory implementations in the backend. This separation allows our compiler to map high-level Halide applications to different CGRA memory designs, including some with a ready-valid interface. The separation also opens the opportunity for optimizing push memory elements on reconfigurable arrays. Our optimized memory implementation, the Physical Unified Buffer, uses a wide-fetch, single-port SRAM macro with built-in address generation logic to implement a buffer with two read and two write ports. It is 18% smaller and consumes 31% less energy than a physical buffer implementation using a dual-port memory that only supports two ports. Finally, our system evaluation shows that enabling a compiler to support CGRAs leads to performance and energy benefits. Over a wide range of image processing and machine learning applications, our CGRA achieves 4.7× better runtime and 3.5× better energy-efficiency compared to an FPGA.

Read full abstract

This paper presents a multimode memory-based Fast Fourier Transform (FFT) processor for a medical system aimed at Fourier-domain optical coherence tomography (FD-OCT) capable of supporting wireless displays based on multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM). The proposed FFT processor enables the use of 2-stream 4096/2048/1024-point FFTs and 1- to 4-stream 128/64-point FFTs for FD-OCT and OFDM applications, respectively. Using cost-effective four-bank single-port SRAM operating in four-word data width, the proposed design provides data access for up to sixteen memory paths. In conjunction with a proposed FFT kernel devised using hardware-efficient multiplication and cache units, the proposed system allows high-throughput multimode FFT operations in an energy- and area-efficient configuration. A test chip was designed using TSMC-0.18 μm CMOS technology with a core size of 4.8 mm 2 . Post-layout simulation performing 4096-point FFT at 80 MHz and the 128-point FFT at 40 MHz achieved throughput of 152 MS/s and 160 MS/s with power consumption of 156.2 mW and 69.9 mW, respectively. Compared to the previous approaches fully or partially supporting the specified OCT/OFDM FFTs, different degrees of area or energy efficiency improvements can be shown by our design depending on the FFT operation mode. In addition, system-level verification for practical OCT imaging was also performed using an FPGA platform.

Read full abstract

Single-port SRAM Research Articles

Related Topics

Articles published on Single-port SRAM

Impact of total ionizing dose on the alpha-soft error rate in FDSOI 28 nm SRAMs

Single-Event Upsets for Single-Port and Two-Port SRAM Cells at the 5-nm FinFET Technology

Evaluation of the Single-Event-Upset Vulnerability for Low-Energy Protons at the 7- and 5-nm Bulk FinFET Nodes

Study of Multicell Upsets in SRAM at a 5-nm Bulk FinFET Node

Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators

Design and implementation of Dual-Port Memory

Programmable FFT Processor using Dual RAM and ROM Technologies for Future 5G Communications

Pseudo Multi-Port SRAM Circuit for Image Processing in Display Drivers

Five-Transistor Single-Port SRAM Bit Cell with Hight Speed and Low Standby Current

A Hardware Decoder Architecture for General String Matching Technique

A High-Throughput and Multi-Parallel VLSI Architecture for HEVC Deblocking Filter

A Parallel-Access Mapping Method for the Data Exchange Buffers Around DCT/IDCT in HEVC Encoders Based on Single-Port SRAMs

Multimode Memory-Based FFT Processor for Wireless Display FD-OCT Medical Systems

Single-Port SRAM-Based Transpose Memory With Diagonal Data Mapping for Large Size 2-D DCT/IDCT

A leakage current suppression technique for cascade SRAM array in 55 nm CMOS technology

Low-Complexity Multi-Mode Memory-Based FFT Processor for DVB-T2 Applications

48 Cycles-per-macro block deblocking filter accelerator for high-resolution H.264/AVC decoding

A Programmable, Scalable-Throughput Interleaver

An 11 mm$^{2}$, 70 mW Fully Programmable Baseband Processor for Mobile WiMAX and DVB-T/H in 0.12$\ \mu$m CMOS

Low-cost reconfigurable VLSI architecture for fast fourier transform

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Single-port SRAM Research Articles

Related Topics

Articles published on Single-port SRAM

Impact of total ionizing dose on the alpha-soft error rate in FDSOI 28 nm SRAMs

Single-Event Upsets for Single-Port and Two-Port SRAM Cells at the 5-nm FinFET Technology

Evaluation of the Single-Event-Upset Vulnerability for Low-Energy Protons at the 7- and 5-nm Bulk FinFET Nodes

Study of Multicell Upsets in SRAM at a 5-nm Bulk FinFET Node

Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators

Design and implementation of Dual-Port Memory

Programmable FFT Processor using Dual RAM and ROM Technologies for Future 5G Communications

Pseudo Multi-Port SRAM Circuit for Image Processing in Display Drivers

Five-Transistor Single-Port SRAM Bit Cell with Hight Speed and Low Standby Current

A Hardware Decoder Architecture for General String Matching Technique

A High-Throughput and Multi-Parallel VLSI Architecture for HEVC Deblocking Filter

A Parallel-Access Mapping Method for the Data Exchange Buffers Around DCT/IDCT in HEVC Encoders Based on Single-Port SRAMs

Multimode Memory-Based FFT Processor for Wireless Display FD-OCT Medical Systems

Single-Port SRAM-Based Transpose Memory With Diagonal Data Mapping for Large Size 2-D DCT/IDCT

A leakage current suppression technique for cascade SRAM array in 55 nm CMOS technology

Low-Complexity Multi-Mode Memory-Based FFT Processor for DVB-T2 Applications

48 Cycles-per-macro block deblocking filter accelerator for high-resolution H.264/AVC decoding

A Programmable, Scalable-Throughput Interleaver

An 11 mm$^{2}$, 70 mW Fully Programmable Baseband Processor for Mobile WiMAX and DVB-T/H in 0.12$\ \mu$m CMOS

Low-cost reconfigurable VLSI architecture for fast fourier transform