External Memory Research Articles

The record-breaking performance of deep neural networks (DNNs) comes with heavy parameter budgets, which leads to external dynamic random access memory (DRAM) for storage. The prohibitive energy of DRAM accesses makes it nontrivial for DNN deployment on resource-constrained devices, calling for minimizing the movements of weights and data in order to improve the energy efficiency. Driven by this critical bottleneck, we present SmartDeal, a hardware-friendly algorithm framework to trade higher-cost memory storage/access for lower-cost computation, in order to aggressively boost the storage and energy efficiency, for both DNN inference and training. The core technique of SmartDeal is a novel DNN weight matrix decomposition framework with respective structural constraints on each matrix factor, carefully crafted to unleash the hardware-aware efficiency potential. Specifically, we decompose each weight tensor as the product of a small basis matrix and a large structurally sparse coefficient matrix whose nonzero elements are readily quantized to the power-of-2. The resulting sparse and readily quantized DNNs enjoy greatly reduced energy consumption in data movement as well as weight storage, while incurring minimal overhead to recover the original weights thanks to the required sparse bit-operations and cost-favorable computations. Beyond inference, we take another leap to embrace energy-efficient training, by introducing several customized techniques to address the unique roadblocks arising in training while preserving the SmartDeal structures. We also design a dedicated hardware accelerator to fully utilize the new weight structure to improve the real energy efficiency and latency performance. We conduct experiments on both vision and language tasks, with nine models, four datasets, and three settings (inference-only, adaptation, and fine-tuning). Our extensive results show that 1) being applied to inference, SmartDeal achieves up to 2.44x improvement in energy efficiency as evaluated using real hardware implementations and 2) being applied to training, SmartDeal can lead to 10.56x and 4.48x reduction in the storage and the training energy cost, respectively, with usually negligible accuracy loss, compared to state-of-the-art training baselines. Our source codes are available at: https://github.com/VITA-Group/SmartDeal.

Read full abstract

Neuromorphic cognitive computing offers a bio-inspired means to approach the natural intelligence of biological neural systems in silicon integrated circuits. Typically, such circuits either reproduce biophysical neuronal dynamics in great detail as tools for computational neuroscience, or abstract away the biology by simplifying the functional forms of neural computation in large-scale systems for machine intelligence with high integration density and energy efficiency. Here we report a hybrid which offers biophysical realism in the emulation of multi-compartmental neuronal network dynamics at very large scale with high implementation efficiency, and yet with high flexibility in configuring the functional form and the network topology. The integrate-and-fire array transceiver (IFAT) chip emulates the continuous-time analog membrane dynamics of 65 k two-compartment neurons with conductance-based synapses. Fired action potentials are registered as address-event encoded output spikes, while the four types of synapses coupling to each neuron are activated by address-event decoded input spikes for fully reconfigurable synaptic connectivity, facilitating virtual wiring as implemented by routing address-event spikes externally through synaptic routing table. Peak conductance strength of synapse activation specified by the address-event input spans three decades of dynamic range, digitally controlled by pulse width and amplitude modulation (PWAM) of the drive voltage activating the log-domain linear synapse circuit. Two nested levels of micro-pipelining in the IFAT architecture improve both throughput and efficiency of synaptic input. This two-tier micro-pipelining results in a measured sustained peak throughput of 73 Mspikes/s and overall chip-level energy efficiency of 22 pJ/spike. Non-uniformity in digitally encoded synapse strength due to analog mismatch is mitigated through single-point digital offset calibration. Combined with the flexibly layered and recurrent synaptic connectivity provided by hierarchical address-event routing of registered spike events through external memory, the IFAT lends itself to efficient large-scale emulation of general biophysical spiking neural networks, as well as rate-based mapping of rectified linear unit (ReLU) neural activations.

Read full abstract

External Memory Research Articles

Related Topics

Articles published on External Memory

MoViT: Memorizing Vision Transformers for Medical Image Analysis.

Utilizing a feature-aware external memory network for helpfulness prediction in e-commerce reviews

SmartDeal: Remodeling Deep Network Weights for Efficient Inference and Training.

Ferroelectric-defined reconfigurable homojunctions for in-memory sensing and computing.

Memory capacity of recurrent neural networks with matrix representation

AoCStream: All-on-Chip CNN Accelerator with Stream-Based Line-Buffer Architecture and Accelerator-Aware Pruning.

Polyomino: A 3D-SRAM-Centric Accelerator for Randomly Pruned Matrix Multiplication With Simple Reordering Algorithm and Efficient Compression Format in 180-nm CMOS

A temporally and spatially local spike-based backpropagation algorithm to enable training in hardware

A 22-pJ/spike 73-Mspikes/s 130k-compartment neural array transceiver with conductance-based synaptic and membrane dynamics.

Improving Structured Grid-Based Sparse Matrix-Vector Multiplication and Gauss–Seidel Iteration on GPDSP

External Slot Relationship Memory for Multi-Domain Dialogue State Tracking

Age-related differences in memory when offloading important information.

Generating synthetic clinical data that capture class imbalanced distributions with generative adversarial networks: Example using antiretroviral therapy for HIV

Preliminary Evidence for Dementia Collaborative Coaching.

ELECTRONIC EDUCATIONAL CONTENT FORMATION WITH THE APPLICATION OF CLOUD TECHNOLOGIES

LL‐FMC: Low‐latency frame memory compression scheme with high reconstructed quality

Learning to Memorize Entailment and Discourse Relations for Persona-Consistent Dialogues

Proof-of-Concept Real-Time Implementation of Interleavers for Optical Satellite Links

Does memory rehabilitation improve health outcomes in people with multiple sclerosis? A Cochrane Review summary with commentary.

Compressing fully connected layers of deep neural networks using permuted features

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

External Memory Research Articles

Related Topics

Articles published on External Memory

MoViT: Memorizing Vision Transformers for Medical Image Analysis.

Utilizing a feature-aware external memory network for helpfulness prediction in e-commerce reviews

SmartDeal: Remodeling Deep Network Weights for Efficient Inference and Training.

Ferroelectric-defined reconfigurable homojunctions for in-memory sensing and computing.

Memory capacity of recurrent neural networks with matrix representation

AoCStream: All-on-Chip CNN Accelerator with Stream-Based Line-Buffer Architecture and Accelerator-Aware Pruning.

Polyomino: A 3D-SRAM-Centric Accelerator for Randomly Pruned Matrix Multiplication With Simple Reordering Algorithm and Efficient Compression Format in 180-nm CMOS

A temporally and spatially local spike-based backpropagation algorithm to enable training in hardware

A 22-pJ/spike 73-Mspikes/s 130k-compartment neural array transceiver with conductance-based synaptic and membrane dynamics.

Improving Structured Grid-Based Sparse Matrix-Vector Multiplication and Gauss–Seidel Iteration on GPDSP

External Slot Relationship Memory for Multi-Domain Dialogue State Tracking

Age-related differences in memory when offloading important information.

Generating synthetic clinical data that capture class imbalanced distributions with generative adversarial networks: Example using antiretroviral therapy for HIV

Preliminary Evidence for Dementia Collaborative Coaching.

ELECTRONIC EDUCATIONAL CONTENT FORMATION WITH THE APPLICATION OF CLOUD TECHNOLOGIES

LL‐FMC: Low‐latency frame memory compression scheme with high reconstructed quality

Learning to Memorize Entailment and Discourse Relations for Persona-Consistent Dialogues

Proof-of-Concept Real-Time Implementation of Interleavers for Optical Satellite Links

Does memory rehabilitation improve health outcomes in people with multiple sclerosis? A Cochrane Review summary with commentary.

Compressing fully connected layers of deep neural networks using permuted features