Memory Parallel Research Articles

Today, the increasing demand for fast routing processes has turned the address look-up (AL) operation into one of the main critical performance operations in modern optical networks, since it conventionally relies on slow-performing AL tables. Specifically, AL memory tables are comprised of content addressable memories (CAMs) for storing a known route of the forwarding information base of the router, and random access memories (RAMs) for storing the respective output port for this route. They thus allow for a one-cycle search operation of a packet’s destination address, yet they typically operate at speeds well below 1 GHz, in contrast with the vastly increasing optical line rates. In this paper, we present our overall vision towards light-based optical AL memory functionalities that may facilitate faster router AL operations, as the means to replace slow-performing electronic counterparts. In order to achieve this, we report on the development of a novel optical RAM cell architecture that performs for the first time with a speed of up to 10 Gb s−1, as well as our latest works on multi-bit 10 Gb s−1 optical CAM cell architectures. Specifically, the proposed optical RAM cell exploits a semiconductor optical amplifier-Mach–Zehnder interferometer in a push-pull configuration and deep saturation regime, doubling the speed of prior optical RAM cell configurations. Error-free write/read operation is demonstrated with a peak power penalty of 6.2 dB and 0.4 dB, respectively. Next, we present the recent progress on optical CAM cell architectures, starting with an experimental demonstration of a 2-bit optical CAM match-line architecture that achieves an exact bitwise search operation of an incoming 2-bit destination address at 10 Gb s−1, while the analysis is also extended to a numerical evaluation of a multi-cell 4-bit CAM-based row architecture with wavelength division multiplexed outputs for fast parallel memory operations at speeds of up to 4 × 20 Gb s−1. Finally, we present a comparative study between electronic and optical RAMs and CAMs in terms of energy and speed and discuss the further challenges towards our vision.

Read full abstract

As the exponential performance-per-watt gains from traditional CMOS scaling end, new computing paradigms are being seriously considered. Neural network algorithms such as deep neural networks (DNN) offer compatibility with analog, in-memory computing paradigm with the potential for orders of magnitude improvement over today’s special purpose chips. In this analog paradigm, neural network weights are stored as conductance levels in a physical matrix of tunable nonvolatile resistance memory devices, such as oxide-based resistive memory (ReRAM), phase change memory (PCRAM), three terminal floating gate, and redox memories. Our detailed analysis of a neural processing unit (NPU) core analog block based on these devices can perform individual inference and training operations at energies as low as 10 fJ per operation (100 TOPs/W), achieving greater than a 100x improvement over the best modern digital neural training processors. This will achieve revolutionary performance for modern deep neural network accelerators, which are currently found in computing from smartphones to datacenters. Improvements originate from the efficient mapping of key inference and training logical operations directly to the physical memory array. The multiply-accumulate operation is physically performed by Kirchoff’s Voltage Law (to multiply an input with a weight) and Kirchoff’s current law (to accumulate column weights) (Fig. 1). The enormous energy expense of moving the data from an arithmetic-logic-unit to a register and back several times per training operation is reduced by a factor roughly proportional to the number of rows (or columns) of a neural network weight matrix (typically >100). Analog array benefits have been well established for the inference stage of neural networks and are now under early commercial development. Harnessing analog arrays for neural network training offers to solve a significantly more energy intensive task than inference only – but analog training remains elusive due to device-level challenges. Training places electrical constraints on the analog memory far beyond those for inference-only accelerators. Foremost among these challenges is obtaining sufficient “linearity” and “symmetry” in analog memory devices. The conductance increments during a weight update operation must occur without feedback to verify the correct conductance change, as this iterative process would negate the energy and latency advances over digital CMOS. This open-loop weight update scheme requires that the analog memory changes a predictable conductance value linearly proportional to the number of write pulses, regardless of the initial state. Hence, this is known as “linearity” of an analog device. The linearity often tends to be differ between the positive and negative conductance increments, indicating poor device "symmetry." Furthermore, devices in an array must have significantly overlapping conductance ranges over many cycles. If the analog memory elements do not have sufficient linearity and symmetry, the neural network will suffer unacceptable accuracy loss during the training stage. This is still an area of active research, with several strong two- and three-terminal candidates. Sandia has developed a characterization methodology and modeling code, CrossSim, which allows the training accuracy to be evaluated for experimental devices. Two terminal emerging memories including oxide-based resistive memory (ReRAM) and phase change memory (PCRAM) are very dense and capable of high speed switching. However, typical TaOx ReRAM and GST PCRAM have been evaluated and found to suffer unacceptable accuracy losses, due to the nonlinearity and asymmetry inherent to the devices. In the case of ReRAM this is most likely due to the nonlinear acceleration of the filament growth these devices when switched from the high conductance to low conductance state, or the abrupt amorphization in the low to high conductance change during the PCRAM RESET process. Recently, three-terminal floating gate semiconductor-oxide-nitride-oxide-semiconductor (SONOS) cell, a close cousin of NAND-flash, have been demonstrated to have high training accuracy, low energy, and may be the best near-term weight storage device. However, challenges of SONOS include the relatively long write latency (>1μs), high voltage, limited scalability, and low endurance. The most novel emerging three terminal class of devices known as nonvolatile redox transistors show excellent training behavior with very low write voltages. These three terminal elements, such as Sandia’s Lithium-Ion Synaptic Transistor for Analog Computing (LISTA), utilize the electrochemical mechanism of a lithium-ion battery, where cathode conductivity is modulated by lithium or proton ion transport (charging and discharging the battery). While very promising, these devices face research challenges of increased speed, endurance, and CMOS integration. This presentation will discuss current research directions for key two- and three-terminal analog neural memory devices, with a focus on areas where materials and device innovation can drive significant process in neural training technology. SNL is managed and operated by NTESS under DOE NNSA contract DE-NA0003525. Figure 1

Read full abstract

Memory Parallel Research Articles

Related Topics

Articles published on Memory Parallel

Frequency- and Power-Dependent Photoresponse of a Perovskite Photodetector Down to the Single-Photon Level.

Electrolyte-gated transistors for synaptic electronics, neuromorphic computing, and adaptable biointerfacing

Digital and Analog Switching Characteristics of InGaZnO Memristor Depending on Top Electrode Material for Neuromorphic System

Time-to-Voltage Converters Based on the Time-Sharing Principle

Research and Design on Reducing the Difficulty of Magnetization of a Hybrid Permanent Magnet Memory Motor

Re-animating Ghosts: Materiality and memory in hauntological appropriation

Improved Particle Filter Resampling Architectures

Electric readout of magnetic stripes in insulators

A Novel Compound Synapse Using Probabilistic Spin–Orbit-Torque Switching for MTJ-Based Deep Neural Networks

Nitric oxide acts as a cotransmitter in a subset of dopaminergic neurons to diversify memory dynamics.

Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization

AI hardware acceleration with analog memory: Microarchitectures for low energy at high speed

Analyzing the Monolithic Integration of a ReRAM-Based Main Memory Into a CPU's Die

Low Power, CMOS-MoS2 Memtransistor based Neuromorphic Hybrid Architecture for Wake-Up Systems

Charge-Trap Transistors for CMOS-Only Analog Memory

Optical memory architectures for fast routing address look-up (AL) table operation

Low-Voltage, CMOS-Free Synaptic Memory Based on LiXTiO2 Redox Transistors.

Visual versus Verbal Working Memory in Statistically Determined Patients with Mild Cognitive Impairment: On behalf of the Consortium for Clinical and Epidemiological Neuropsychological Data Analysis (CENDA).

Time-dependent saccadic selection in analogue and categorical visual short-term memory tasks

(Invited) Energy Efficient Neural Network Training with Analog Synapses: Challenges and Opportunities

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Memory Parallel Research Articles

Related Topics

Articles published on Memory Parallel

Frequency- and Power-Dependent Photoresponse of a Perovskite Photodetector Down to the Single-Photon Level.

Electrolyte-gated transistors for synaptic electronics, neuromorphic computing, and adaptable biointerfacing

Digital and Analog Switching Characteristics of InGaZnO Memristor Depending on Top Electrode Material for Neuromorphic System

Time-to-Voltage Converters Based on the Time-Sharing Principle

Research and Design on Reducing the Difficulty of Magnetization of a Hybrid Permanent Magnet Memory Motor

Re-animating Ghosts: Materiality and memory in hauntological appropriation

Improved Particle Filter Resampling Architectures

Electric readout of magnetic stripes in insulators

A Novel Compound Synapse Using Probabilistic Spin–Orbit-Torque Switching for MTJ-Based Deep Neural Networks

Nitric oxide acts as a cotransmitter in a subset of dopaminergic neurons to diversify memory dynamics.

Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization

AI hardware acceleration with analog memory: Microarchitectures for low energy at high speed

Analyzing the Monolithic Integration of a ReRAM-Based Main Memory Into a CPU's Die

Low Power, CMOS-MoS2 Memtransistor based Neuromorphic Hybrid Architecture for Wake-Up Systems

Charge-Trap Transistors for CMOS-Only Analog Memory

Optical memory architectures for fast routing address look-up (AL) table operation

Low-Voltage, CMOS-Free Synaptic Memory Based on LiXTiO2 Redox Transistors.

Visual versus Verbal Working Memory in Statistically Determined Patients with Mild Cognitive Impairment: On behalf of the Consortium for Clinical and Epidemiological Neuropsychological Data Analysis (CENDA).

Time-dependent saccadic selection in analogue and categorical visual short-term memory tasks

(Invited) Energy Efficient Neural Network Training with Analog Synapses: Challenges and Opportunities