Scratchpad Memory Research Articles

Wireless communication standards such as Long-term Evolution (LTE) are rapidly changing to support the high data-rate of wireless devices. The physical layer baseband processing has strict real-time deadlines, especially in the next-generation applications enabled by the 5G standard. Existing basestation transceivers utilize customized DSP cores or fixed-function hardware accelerators for physical layer baseband processing. However, these approaches incur significant non-recurring engineering costs and are inflexible to newer standards or updates. Software-programmable processors offer more adaptability. However, it is challenging to sustain guaranteed worst-case latency and throughput at reasonably low-power on shared-memory many-core architectures featuring inherently unpredictable design choices, such as caches and Network-on-chip (NoC). We propose SPECTRUM , a predictable, software-defined many-core architecture that exploits the massive parallelism of the LTE/5G baseband processing workload. The focus is on designing scalable lightweight hardware that can be programmed and defined by sophisticated software mechanisms. SPECTRUM employs hundreds of lightweight in-order cores augmented with custom instructions that provide predictable timing, a purely software-scheduled NoC that orchestrates the communication to avoid any contention, and per-core software-controlled scratchpad memory with deterministic access latency. Compared to many-core architecture like Skylake-SP (average power 215 W) that drops 14% packets at high-traffic load, 256-core SPECTRUM by definition has zero packet drop rate at significantly lower average power of 24 W. SPECTRUM consumes 2.11× lower power than C66x DSP cores+accelerator platform in baseband processing. We also enable SPECTRUM to handle dynamic workloads with multiple service categories present in 5G mobile network (Enhanced Mobile Broadband (eMBB), Ultra-reliable and Low-latency Communications (URLLC), and Massive Machine Type Communications (mMTC)), using a run-time scheduling and mapping algorithm. Experimental evaluations show that our algorithm performs task/NoC mapping at run-time on fewer cores compared to the static mapping (that reserves cores exclusively for each service category) while still meeting the differentiated latency and reliability requirements.

Read full abstract

Deep neural networks (DNNs) achieve best-known accuracies in many machine learning tasks involved in image, voice, and natural language processing and are being used in an ever-increasing range of applications. However, their algorithmic benefits are accompanied by extremely high computation and storage costs, sparking intense efforts in optimizing the design of computing platforms for DNNs. Today, graphics processing units (GPUs) and specialized digital CMOS accelerators represent the state-of-the-art in DNN hardware, with near-term efforts focusing on approximate computing through reduced precision. However, the ever-increasing complexities of DNNs and the data they process have fueled an active interest in alternative hardware fabrics that can deliver the next leap in efficiency. Resistive crossbars designed using emerging nonvolatile memory technologies have emerged as a promising candidate building block for future DNN hardware fabrics since they can natively execute massively parallel vector-matrix multiplications (the dominant compute kernel in DNNs) in the analog domain within the memory arrays. Leveraging in-memory computing and dense storage, resistive-crossbar-based systems cater to both the high computation and storage demands of complex DNNs and promise energy efficiency beyond current DNN accelerators by mitigating data transfer and memory bottlenecks. However, several design challenges need to be addressed to enable their adoption. For example, the overheads of peripheral circuits (analog-to-digital converters and digital-to-analog converters) and other components (scratchpad memories and on-chip interconnect) may significantly diminish the efficiency benefits at the system level. Additionally, the analog crossbar computations are intrinsically subject to noise due to a range of device- and circuit-level nonidealities, potentially leading to lower accuracy at the application level. In this article, we highlight the prospects for designing hardware accelerators for neural networks using resistive crossbars. We also underscore the key open challenges and some possible approaches to address them.

Read full abstract

Scratchpad Memory Research Articles

Related Topics

Articles published on Scratchpad Memory

DESCNet: Developing Efficient Scratchpad Memories for Capsule Network Hardware

SPX64

Parallelization and Optimization of NSGA-II on Sunway TaihuLight System

Space‐address decoupled scratchpad memory management for neural network accelerators

A Vector-Length Agnostic Compiler for the Connex-S Accelerator with Scratchpad Memory

SPECTRUM

NVM-Shelf: Secure Hybrid Encryption with Less Flip for Non-Volatile Memory

Cache Leakage Reduction Techniques for Hybrid SPM-Cache Architectures

Protecting scratchpad memory addresses against soft errors

Enhancing Matrix Multiplication With a Monolithic 3-D-Based Scratchpad Memory

Branch-aware data variable allocation for energy optimization of hybrid SRAM+NVM SPM☆

Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor

Compiling for the Worst Case

A Hardware-assisted Heartbeat Mechanism for Fault Identification in Large-scale IoT Systems

A 28-nm Coarse Grain 2D-Reconfigurable Array With Data Forwarding

SwHPFM: Refactoring and Optimizing the Structured Grid Fluid Mechanical Algorithm on the Sunway TaihuLight Supercomputer

Utilization-Aware Data Variable Allocation on NVM-Based SPM in Real-Time Embedded Systems

A Low-Power, High-Performance Speech Recognition Accelerator

Neural network accelerator design with resistive crossbars: Opportunities and challenges

Hybrid Scratchpad Video Memory Architecture for Energy-Efficient Parallel HEVC

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Scratchpad Memory Research Articles

Related Topics

Articles published on Scratchpad Memory

DESCNet: Developing Efficient Scratchpad Memories for Capsule Network Hardware

SPX64

Parallelization and Optimization of NSGA-II on Sunway TaihuLight System

Space‐address decoupled scratchpad memory management for neural network accelerators

A Vector-Length Agnostic Compiler for the Connex-S Accelerator with Scratchpad Memory

SPECTRUM

NVM-Shelf: Secure Hybrid Encryption with Less Flip for Non-Volatile Memory

Cache Leakage Reduction Techniques for Hybrid SPM-Cache Architectures

Protecting scratchpad memory addresses against soft errors

Enhancing Matrix Multiplication With a Monolithic 3-D-Based Scratchpad Memory

Branch-aware data variable allocation for energy optimization of hybrid SRAM+NVM SPM☆

Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor

Compiling for the Worst Case

A Hardware-assisted Heartbeat Mechanism for Fault Identification in Large-scale IoT Systems

A 28-nm Coarse Grain 2D-Reconfigurable Array With Data Forwarding

SwHPFM: Refactoring and Optimizing the Structured Grid Fluid Mechanical Algorithm on the Sunway TaihuLight Supercomputer

Utilization-Aware Data Variable Allocation on NVM-Based SPM in Real-Time Embedded Systems

A Low-Power, High-Performance Speech Recognition Accelerator

Neural network accelerator design with resistive crossbars: Opportunities and challenges

Hybrid Scratchpad Video Memory Architecture for Energy-Efficient Parallel HEVC