Training deep neural networks (DNNs) is computationally intensive, but arrays of non-volatile memories such as charge trap flash (CTF) can accelerate DNN operations using in-memory computing. Specifically, the resistive processing unit (RPU) architecture uses a voltage-threshold program with stochastic encoded pulse trains and analog memory features to accelerate vector-vector outer product and weight update for gradient descent algorithms. Although CTF, offering high precision, has been regarded as an excellent choice for implementing RPU, the accumulation of charge due to the applied stochastic pulse trains is ultimately of critical significance in determining the final weight update. In this paper, we report on the non-ideal program-time conservation in CTF through pulsing input measurements. We experimentally measure the effect of pulse width and pulse gap, keeping the total ON-time of the input pulse train constant, and report three non-idealities: (1) the cumulative shift reduces when total ON-time is fragmented into a larger number of shorter pulses, (2) the cumulative shift drops abruptly for pulse widths <2 s, (3) the cumulative shift depends on the gap between consecutive pulses and the shift reduction recovers for smaller gaps. We present an explanation based on a transient tunneling field enhancement due to blocking oxide trap-charge dynamics to explain these non-idealities. Identifying and modeling the responsible mechanisms and predicting their system-level effects during learning is critical. This non-ideal accumulation is expected to affect algorithms and architectures relying on devices for implementing mathematically equivalent functions for in-memory computing-based acceleration.
Read full abstract