Bruno: backpropagation running undersampled for novel device optimisation
Abstract Recent efforts to improve the efficiency of neuromorphic and machine learning systems have centred on developing of specialised hardware for neural networks. These systems typically feature architectures that go beyond the von Neumann model employed in general-purpose hardware such as GPUs, offering potential efficiency and performance gains. However, neural networks developed for specialised hardware must consider its specific characteristics. This requires novel training algorithms and accurate hardware models, since they cannot be abstracted as a general-purpose computing platform. In this work, we present a bottom-up approach to training neural networks for hardware-based spiking neurons and synapses, built using ferroelectric capacitors (FeCAPs) and resistive random-access memories (RRAMs), respectively. Unlike the common approach of designing hardware to fit abstract neuron or synapse models, we start with compact models of the physical device to model the computational primitives. Based on these models, we have developed a training algorithm (BRUNO) that can reliably train the networks, even when applying hardware limitations, such as stochasticity or low bit precision. We analyse and compare BRUNO with Backpropagation Through Time. We test it on different spatio-temporal datasets. First on a music prediction dataset, where a network composed of ferroelectric leaky integrate-and-fire (FeLIF) neurons is used to predict at each time step the next musical note that should be played. The second dataset consists on the classification of the Braille letters using a network composed of quantised RRAM synapses and FeLIF neurons. The performance of this network is then compared with that of networks composed of LIF neurons. Experimental results show the potential advantages of using BRUNO by reducing the time and memory required to detect spatio-temporal patterns with quantised synapses.
- Research Article
2
- 10.1149/ma2024-01573001mtgabs
- Aug 9, 2024
- Electrochemical Society Meeting Abstracts
Recently, memristive devices such as phase change random access memory (PCRAM), resistive random access memory (ReRAM), and ferroelectric random access memory (FeRAM) have been actively researched to implement hyper-scale synaptic cores of various ANNs, i.e., deep neural networks (DNNs), spiking neural networks (SNNs), convolutional neural networks (CNNs), and binarized neural networks (BNNs). In particular, ReRAM is currently being highlighted as an artificial synaptic device because of its multilevel capability (> 11 bits)1, high switching speed (< 100 ps)2, high endurance (> 1012 cycles)3, high scalability, and complementary metal-oxide-semiconductor (CMOS) compatibility4. ReRAM is generally divided into two types: valence change memory (VCM) cells having oxygen vacancy filaments and electrochemical metallization (ECM) cells having conductive metal filaments. VCM cells have great retention but exhibit high switching current (~ 100 μA). In contrast, ECM cells exhibit relatively lower switching current (~10 uA) than VCM cells but have poor retention due to diffusion phenomena at high external temperatures5. Due to their high switching current and poor reliability, VCM and ECM cells have limitations in their application as synaptic devices in next-generation neuromorphic systems for energy consumption similar to that of the human brain (~20 W). In this study, for the first time, we developed a Ru-based ultra-low-power (< 1μA) hybrid synaptic memristor having a simultaneous-controlled mechanism of Ru cations and oxygen anions in a resistive switching layer. To elucidate the simultaneous-controlled mechanism of Ru cations and oxygen anions, we fabricated the VCM, ECM, and Ru-based hybrid memristor and performed x-ray photoelectron spectroscopy (XPS) and time-of-flight secondary ion mass spectrometry (ToF-SIMS) analysis depending on the resistance states (i.e., pristine, set, and reset). In addition, the mobility of mobile species in VCM, ECM, and Ru-based hybrid memristors was calculated indirectly via electrical properties (switching speed, operating voltage) and ToF-SIMS depth profiles, revealing the mechanism by which Ru-based hybrid memristor has low switching currents (< 1μA). Finally, the power consumption of active synaptic cores in the training and inference process of the designed DNN was evaluated for VCM, ECM, and Ru-based hybrid memristor. The simultaneous-controlled mechanism of Ru cations and oxygen anions of the Ru-based hybrid memristor and its application to DNN will be presented in detail. Acknowledgement This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. RS-2023-00260527) and Institute of Information & communications Technology Planning & Evaluation (IITP) under the artificial intelligence semiconductor support program to nurture the best talents (IITP-(2023)-RS-2023-00253914) grant funded by the Korea government(MSIT)". References Rao, M. et al. Thousands of conductance levels in memristors integrated on CMOS. Nature 615, 823–829 (2023).Yu, S., Wu, Y., Jeyasingh, R., Kuzum, D. & Wong, H. S. P. An electronic synapse device based on metal oxide resistive switching memory for neuromorphic computation. IEEE Trans. Electron Devices 58, 2729–2737 (2011).Lee, H. Y. et al. Evidence and solution of over-RESET problem for HfOX based resistive memory with sub-ns switching speed and high endurance. Tech. Dig. - Int. Electron Devices Meet. IEDM 7–10 (2010) doi:10.1109/IEDM.2010.5703395.Wong, H. S. P. et al. Metal-oxide RRAM. Proc. IEEE 100, 1951–1970 (2012).Yoon, J. H. et al. A Low-Current and Analog Memristor with Ru as Mobile Species. Adv. Mater. 32, 1–9 (2020). Figure 1
- Research Article
- 10.1149/ma2025-01361729mtgabs
- Jul 11, 2025
- Electrochemical Society Meeting Abstracts
The microcontroller (MCU) market is progressively adopting alternatives to the NOR Flash as the embedded technology solutions below the 28nm node. Foundries already propose Magnetic Random Access Memory (MRAM) or oxide-based Resistive RAM (ReRAM) that are integrated in the Back-End-Of-the-Line (BEOL) above CMOS from 22nm node down to 12nm [1], [2], [3], [4], [5]. Another company offers the Phase-Change-Memory (PCM) as the resistive element at the 28nm and 18nm [6], [7]. The physics behind the switching mechanisms of these devices is different and so their comparative advantages/drawbacks. In this presentation, based on our experience, we will review the advantages and challenges of the different approaches.At CEA-Leti, we fabricated ReRAMs on 28nm CMOS, showing intrinsic reliability of more than 1E5 cycles endurance and SMT compliance. The overall ReRAM performance are sensible to the forming protocol optimization and the programming conditions, so that using a smart program and verify algorithm, no fails was highlighted on 1Mb array [8]. Finally, we also successfully fabricated a functional 8Mb ReRAMs macro on 22nm CMOS. This technology is very well adapted to low-cost MCUs, owing to its simple manufacturing.Phase-Change-Memory (PCM) provides an alternative solution for high-end MCUs that require high memory capacity and high reliability. Indeed, because it can be integrated with a vertical bipolar and not necessary with a MOS selector, PCM bitcell density exceeds the competition at the 28nm node, with a bitcell size as low as 0.019 µm2 [9]. In sub-10 nm technology nodes, in order to limit the impact of the memory fabrication on logic, the interest could rise for a BEOL selector and a crossbar architecture, allowing the complete memory array integration over the periphery (leading to area saving). CEA-Leti demonstrated for the co-integration of embedded PCM device based on Ge-rich GeSbTe alloys with a BEOL Ovonic Threshold Switch selector. Even if its operation voltage as well as reading and programming reliability still need to be improved, this solution opens a path for the PCM scaling.Not only resistive- but also capacitive-based memories are possible candidates to replace embedded NOR flash. In such devices, the switching mechanism is driven by the electric field and not by the current, which enables a possible smaller selector and thus higher bitcell density than resistive memory. Ferroelectric capacitor memory are thus promising candidates. We integrated HfZrO2-based ferroelectric capacitors connected to the drain of nMOS selectors (in a so-called FeRAM configuration) at the 22nm technology node. 0 bitfail up to 1E10 cycles is demonstrated on these 22nm FeRAM arrays at 2.4V, with median Memory Window larger than 100mV. In order to reduce FeRAM bitcell footprint, ferroelectric material deposition into a trench capacitor were also demonstrated, leading to remnant polarization 2.Pr up to 140μC/cm2 (normalized by footprint). This FeRAM solution can be very attractive for embedded SRAM/DRAM replacement, provided the endurance can be further improved (together with the speed and density).When the ferroelectric capacitor is not connected to the drain of the transistor but integrated at the gate side, the memory is called FeMFET or FeFET. Many researches are currently ongoing in order to fabricate FeFETs completely in the Back-End-Of-Line, especially exploiting oxide semiconductor (like IGZO, IWO, ITO) as the channel materials, which can deposited at low temperature. Such devices highlight better endurance than Si-based FeFETs but their stability and reliability still need to be improved.To conclude, MRAM is currently the main candidate proposed by foundries to replace embedded NOR flash below 28nm node. It was qualified for the automotive grad-0, which represents the highest reliability criterion, together with a 6ns read access time when integrated above a 16nm finFET CMOS [4]. But cost continues to drive the MCU market. In this context, ReRAM is very cost-efficient because of its simple fabrication process and PCM because of its high bitcell density. Finally ferroelectric memories have been developed more recently and are probably well suitable adapted for embedded SRAM/DRAM replacement.[1] M.-F. Chang et al., ISSCC'14; [2] S. Ko et al., VLSI'23; [3] G. Kang et al., VLSI'23; [4] P.-H. Lee et al., ISSCC'23; [5] Y.-C. Huang et al., ISSCC'24; [6] D. Min et al., IEDM'21; [7] F. Disegni et al., VLSI'21; [8] G. Molas et al., IMW'22; [9] F. Arnaud et al., IEDM'18
- Research Article
3
- 10.1109/led.2022.3192262
- Sep 1, 2022
- IEEE Electron Device Letters
Conductance variations of resistive random-access memory (RRAM) are significant challenges that hinder the accurate inference of neural network (NN) hardware. In this study, we exploit the read noise of the RRAM as an active computational enabler for implementing probabilistic NN. As electrical characteristics of RRAM are directly related to the properties of conductive filament (CF), we statistically explore read current of TiO <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">x</sub> -based RRAM with different forming conditions and explain the results by linking the CF model. In addition, an array mapping scheme to transfer weights to one transistor-one RRAM (1T1R) array is experimentally demonstrated. Through NN simulations, we verify that the probabilistic NN shows promising results on nonlinear classification problem avoiding overconfidence compared with deterministic NN.
- Conference Article
- 10.1109/nvmts.2013.6632857
- Oct 1, 2012
Ferroelectric-based nonvolatile memory (FeRAM) is now a mainstream product category supported by Ramtron, Fujitsu, Texas Instruments, IBM, and Matsushita. The internal array of ferroelectric capacitors in FeRAMs consumes absolutely no energy when the chip is powered. This lack of energy consumption combined with SRAM-like operation at SRAM speeds constitutes a unique advantage for FeRAM memory in low-power embedded applications. Because the ferroelectric technology is buried deep inside the IC package surrounded by digital interface circuits, the true nature of ferroelectric memory capacitor operation is obscured from the user. It is possible to operate a simple ferroelectric nonvolatile memory from the I/O pins of a microprocessor using discrete ferroelectric capacitors in a Sawyer-Tower circuit configuration. Designing and operating a ferroelectric memory with discrete capacitors provides insight into the nature of FeRAMs, their internal operation, and their reliability. The read and write times of the discrete memory will be fast, limited only by the current capacity of the I/O pins. The discrete ferroelectric capacitors will retain their written state without concern for whether the microcontroller is powered or not. Unlike EEPROM or FLASH transistors, discrete ferroelectric capacitors can be physically handled during retention without corrupting their stored states.
- Research Article
7
- 10.1109/ted.2020.3028528
- Oct 21, 2020
- IEEE Transactions on Electron Devices
The main challenge in ferroelectric (FE) random access memory (FRAM) scaling is to maintain a high polarization density on the vertical sidewall of 3-D FE capacitors. Two simple and effective methods-stress engineering and optimized interface orientation-are proposed to facilitate the preferential transition from the tetragonal to the orthorhombic phase for ferroelectricity. Four FE phase-progressive experiments were conducted for 2-D/3-D FRAMs with external stress sources and an interfacial layer (IL). Both 2-D and 3-D FRAMs show the wake-up free feature with the presence of both the external stressor and the optimized IL. To extract the sidewall polarization of 3-D FRAM, a set of testkeys was designed and studied. The 3-D FRAM shows an initial sidewall with good reliability and durability with P <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">r</sub> = 18 μC/cm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> and endurance of up to 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">9</sup> cycles. Furthermore, the retention test with the read mode of P <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0,switch</sub> and P <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1,switch</sub> at 85 °C was investigated, and the imprint effect was proved to be the main cause of retention loss.
- Research Article
1
- 10.1149/ma2023-02301560mtgabs
- Dec 22, 2023
- Electrochemical Society Meeting Abstracts
Recently, a cross-point synaptic memristor arrays have been employed to implement synaptic cores of various ANNs, i.e., deep neural networks (DNNs), spiking neural networks (SNNs), and convolutional neural networks (CNNs). The most of reported studies conducted training and inference of ANNs using analog synaptic weight modulation of memristors such as resistive random access memory (ReRAM), phase change random access memory (PCRAM), and ferroelectric random access memory (FeRAM). In other words, no case has been reported so far in which a synaptic memristor with quantized multi-bit is utilized onto a synaptic core of a quantized neural network (QNN).In this study, for the first time, we introduced the multi-bit self-rectifying synaptic memristor having tri-layer structure being composed of oxygen-rich AlOx rectifying layer, oxygen-deficient HfOx top switching layer, and oxygen-rich HfOy bottom switching layer for quantization aware training of QNN. The resistive switching and self-rectifying mechanism of the multi-bit self-rectifying synaptic memristor was evidently proven by precisely investigating migration of oxygen ions and vacancies in resistive switching layers via ToF-SIMS and XPS depth profiles depending on the resistance states (i.e., pristine, set, and reset). The designed multi-bit self-rectifying synaptic memristor presented linear, discrete, and quantized 4-bit (i.e., 16-level) conductance level depending on incremental write pulse number, which was 4-bit self-rectifying synaptic memristor for the first time. In addtion, the quantization aware training (QAT) was conducted using 4-bit quantized conductance level of the desgiend multi-bit self-rectifying synaptic memristor via stright through estimator (STE). Finally, three different iris datasets were successfully classified using a quantized neural network designed via SPICE circuit simulation. The conductance mechanism of the self-rectifying synaptic memristor and its application of QNN will be presented in detail. Acknowledgement This research was supported by National R&D Program through the National Research Foundation of Korea(NRF) funded by Ministry of Science and ICT(2021M3F3A2A01037733) Figure 1
- Conference Article
17
- 10.23919/date48585.2020.9116549
- Mar 1, 2020
Neural networks (NN) have gained great success in visual object recognition and natural language processing, but this kind of data-intensive applications requires huge data movements between computing units and memory. Emerging resistive random-access memory (RRAM) computing systems have demonstrated great potential in avoiding the huge data movements by performing matrix-vector-multiplications in memory. However, the nonvolatility of the RRAM devices may lead to potential stealing of the NN weights stored in crossbars and the adversary could extract the NN models from the stolen weights. This paper proposes an effective security enhancing method for RRAM computing systems to thwart this sort of piracy attack. We first analyze the theft methods of the NN weights. Then we propose an efficient security enhancing technique based on obfuscating the row connections between positive crossbars and their pairing negative crossbars. Two heuristic techniques are also presented to optimize the hardware overhead of the obfuscation module. Compared with existing NN security work, our method eliminates the additional RRAM writing operations used for encryption/decryption, without shortening the lifetime of RRAM computing systems. The experiment results show that the proposed methods ensure the trial times of brute-force attack are more than (16!)17 and the classification accuracy of the incorrectly extracted NN models is less than 20%, with minimal area overhead.
- Research Article
3
- 10.3389/fnins.2021.806325
- Jan 20, 2022
- Frontiers in Neuroscience
Realization of spiking neural network (SNN) hardware with high energy efficiency and high integration may provide a promising solution to data processing challenges in future internet of things (IoT) and artificial intelligence (AI). Recently, design of multi-core reconfigurable SNN chip based on resistive random-access memory (RRAM) is drawing great attention, owing to the unique properties of RRAM, e.g., high integration density, low power consumption, and processing-in-memory (PIM). Therefore, RRAM-based SNN chip may have further improvements in integration and energy efficiency. The design of such a chip will face the following problems: significant delay in pulse transmission due to complex logic control and inter-core communication; high risk of digital, analog, and RRAM hybrid design; and non-ideal characteristics of analog circuit and RRAM. In order to effectively bridge the gap between device, circuit, algorithm, and architecture, this paper proposes a simulation model—FangTianSim, which covers analog neuron circuit, RRAM model and multi-core architecture and its accuracy is at the clock level. This model can be used to verify the functionalities, delay, and power consumption of SNN chip. This information cannot only be used to verify the rationality of the architecture but also guide the chip design. In order to map different network topologies on the chip, SNN representation format, interpreter, and instruction generator are designed. Finally, the function of FangTianSim is verified on liquid state machine (LSM), fully connected neural network (FCNN), and convolutional neural network (CNN).
- Research Article
- 10.4028/www.scientific.net/kem.602-603.1052
- Mar 1, 2014
- Key Engineering Materials
Many nonvolatile memory devices such as, ferroelectric random access memory (FeRAM), magnetic random access memory (MRAM), ovonic universal memory (OUM), and resistive random access memory (RRAM) were considerable discussed and investigated. For these nonvolatile memory devices, the RRAM devices will play an important role because of its non-destructive readout, low operation voltage, high operation speed, long retention time, and simple structure. The RRAM devices were only consist of one resistor and one corresponding transistor. In this study, the CuO thin films deposited on ITO/glass and Pt/Ti/SiO2/Si substrates for applications in RRAM devices were produced and investigated. The optimal sputtering conditions of as-deposited CuO thin films were the rf power of 80 W, chamber pressure of 20 mTorr, substrate temperature of 580°C, and an oxygen concentration of 40%. The basic mechanisms for the bistable resistance switching were observed. The electrical and physics properties of CuO thin films for applications in RRAM devices were discussed.
- Research Article
65
- 10.1109/tcad.2018.2824304
- May 1, 2019
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
The training of neural networks (NN) is usually time-consuming and resource intensive. The emerging metaloxide resistive random-access memory (RRAM) device has shown potential for the computation of NN. RRAM crossbar structure and multibit characteristics can perform the matrix-vector product in high energy efficiency, which is the most common operation of NN. Two challenges exist for realizing training NN based on RRAM. First, the current architectures based on RRAM only support the inference in training NN and cannot perform the backpropagation (BP) and the weight update of training NN. Second, training NN requires enormous iterations to constantly update the weights for reaching the convergence. However, this weight update leads to large energy consumption because of the nonideal factors of RRAM. In this paper, we propose a training-in-memory based on RRAM (TIME) architecture and the peripheral circuit design to enable training NN on RRAM. TIME supports the BP and the weight update while maximizing the re-usage of peripheral circuits of the inference operation on RRAM. Meanwhile, a set of optimization strategies focusing on the nonideal factors are designed to reduce the cost of tuning RRAM. We explore the performance of both supervised learning (SL) and deep reinforcement learning (DRL) on TIME. A specific mapping method of DRL is also introduced to further improve energy efficiency. Simulation results show that in SL, TIME can achieve 5.3× higher energy efficiency on average compared with DaDianNao, an application-specific integrated circuits (ASIC) in CMOS technology. In DRL, TIME can perform an average 126× higher than GPU in energy efficiency. If the cost of tuning RRAM can be further reduced, TIME has the potential to boost the energy efficiency by two orders of magnitudes compared with ASIC.
- Research Article
14
- 10.1109/ted.2021.3131108
- Jan 1, 2022
- IEEE Transactions on Electron Devices
Hf <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0.5</sub> Zr <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0.5</sub> O <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> (HZO) ferroelectric random access memory (FeRAM) has been demonstrated in 130 nm node with 1T1C structure. To scale FeRAM to 28 nm or beyond, a high aspect ratio embedded dynamic random-access memory (eDRAM)-like 3-D cylinder capacitor is expected to ensure sufficient cell capacitance and sense margin. In this work, we investigate an alternative approach with 2T1C structure that takes advantage of a back-end-of-line (BEOL) oxide channel writing transistor, a small planar ferroelectric (FE) capacitor, and a silicon logic reading transistor. First, the proof-of-concept of 2T1C bit cell was experimentally demonstrated. Then, the scalability toward 28 nm or beyond was simulated with array-level parasitics. Thanks to the transconductance reading out mechanism, a 900 nm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> FE capacitor in 2T1C could significantly reduce energy consumption 6.4– <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$9.6\times $ </tex-math></inline-formula> compared to the traditional 1T1C FeRAM with similar cell area at 28 nm. Moreover, the area ratio between the FE capacitor and the read transistor is investigated both experimentally and with SPICE simulation, where adjustment of the pulsing scheme is needed for the maximum sense margin to occur. Finally, the performance at 7 nm is estimated in terms of read/write energy and cell area.
- Research Article
1
- 10.1063/5.0190195
- Feb 20, 2024
- The Journal of Chemical Physics
This study presents findings indicating that the ferroelectric tunnel junction (FTJ) or resistive random-access memory (RRAM) in one cell can be intentionally selected depending on the application. The HfAlO film annealed at 700 °C shows stable FTJ characteristics and can be converted into RRAM by forming a conductive filament inside the same cell, that is, the process of intentionally forming a conductive filament is the result of defect generation and redistribution, and applying compliance current prior to a hard breakdown event of the dielectric film enables subsequent RRAM operation. The converted RRAM demonstrated good memory performance. Through current-voltage fitting, it was confirmed that the two resistance states of the FTJ and RRAM had different transport mechanisms. In the RRAM, the 1/f noise power of the high-resistance state (HRS) was about ten times higher than that of the low-resistance state (LRS). This is because the noise components increase due to the additional current paths in the HRS. The 1/f noise power according to resistance states in the FTJ was exactly the opposite result from the case of the RRAM. This is because the noise component due to the Poole-Frenkel emission is added to the noise component due to the tunneling current in the LRS. In addition, we confirmed the potentiation and depression characteristics of the two devices and further evaluated the accuracy of pattern recognition through a simulation by considering a dataset from the Modified National Institute of Standards and Technology.
- Research Article
1384
- 10.1145/3007787.3001140
- Jun 18, 2016
- ACM SIGARCH Computer Architecture News
Processing-in-memory (PIM) is a promising solution to address the "memory wall" challenges for future computer systems. Prior proposed PIM architectures put additional computation logic in or near memory. The emerging metal-oxide resistive random access memory (ReRAM) has showed its potential to be used for main memory. Moreover, with its crossbar array structure, ReRAM can perform matrix-vector multiplication efficiently, and has been widely studied to accelerate neural network (NN) applications. In this work, we propose a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory. In PRIME, a portion of ReRAM crossbar arrays can be configured as accelerators for NN applications or as normal memory for a larger memory space. We provide microarchitecture and circuit designs to enable the morphable functions with an insignificant area overhead. We also design a software/hardware interface for software developers to implement various NNs on PRIME. Benefiting from both the PIM architecture and the efficiency of using ReRAM for NN computation, PRIME distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance by ~2360× and the energy consumption by ~895×, across the evaluated machine learning benchmarks.
- Conference Article
607
- 10.1109/isca.2016.13
- Jun 1, 2016
Processing-in-memory (PIM) is a promising solution to address the "memory wall" challenges for future computer systems. Prior proposed PIM architectures put additional computation logic in or near memory. The emerging metal-oxide resistive random access memory (ReRAM) has showed its potential to be used for main memory. Moreover, with its crossbar array structure, ReRAM can perform matrix-vector multiplication efficiently, and has been widely studied to accelerate neural network (NN) applications. In this work, we propose a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory. In PRIME, a portion of ReRAM crossbar arrays can be configured as accelerators for NN applications or as normal memory for a larger memory space. We provide microarchitecture and circuit designs to enable the morphable functions with an insignificant area overhead. We also design a software/hardware interface for software developers to implement various NNs on PRIME. Benefiting from both the PIM architecture and the efficiency of using ReRAM for NN computation, PRIME distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance by ~2360x and the energy consumption by ~895x, across the evaluated machine learning benchmarks.
- Book Chapter
4
- 10.1007/978-981-13-5950-7_48
- Jan 1, 2019
Promising nanoelectronic memories such as PCRAM, STT-RAM, Ferroelectric FET Memory and Resistive Random Access Memory (RRAM) are capable of substituting the conventional memory technologies such as SRAMs, DRAMs and flash memory in future computers. Among all these nanoelectronic memories RRAM results in higher density, lower power consumption, higher speed and better scalability which can fulfill the requirements of massive data growth as well as storage. But it is expected to go through numerous faults that reduce the reliability of the system. These faults may arise at any element of the memory system which includes the marginal circuits, inter junction and memory cell array. Read 1 Disturbance (R1D) fault is one among the main faults in RRAM that occurs, if the read value is 0 when 1 is the actual result which is kind of low resistance defect. In RRAM SET voltage (VSET), bit line voltage, restricted thermal stability and accumulated read current pulse leads to read disturbance faults and also when maximum current is applied for a read operation that immediately induces the read disturbance fault. As the read current and write current have the same path, read disturbance faults makes a bit flip. The accumulated effect of this read ‘1’ disturbance degrades the memory reliability. This kind of fault changes the value stored in a particular memory cell which leads to consequent inaccurate read values that keep on propagates till a new logic value is written in the same cell. Read disturbance lies between the read operation and read disturbance fault which is a major concern of today’s NVM since it directly affects the performance of the system. According to march C* test algorithm HfO2 based 1T1R RRAM’s R1D faults can be sensitized by ensuring the presence of ‘1’ (read 1) after the write ‘1’ operation and detected by another read 1 operation. In this article a novel March C2RR algorithm is proposed, here read operation is repeated twice in the second, fourth memory element (R3, R4 in M3 and R5, R6 in M4) and 100% fault coverage is achieved by the proposed method that detects R1D faults and all the random faults effectively.