The energy consumption of training and deploying state-of-the-art artificial intelligence (AI) models has experienced exponential growth, driven by increasing model parameters and the necessary training data. Recent models demand over 1 billion PetaFLOPs of total computation for training, potentially taking weeks to complete. With current GPU performance at approximately 1-3 TOPS/W, GPU training energy consumption alone can exceed 100,000 kWh, equivalent to the monthly energy expenditure of 100 US households. Over the next decade, these models are expected to scale up further, driving total computing energy to constitute a significant portion of global consumption. In this talk, we describe both synaptic and neuronal devices that can accelerate AI algorithms with potentially multiple orders of magnitude improvement in power efficiency.First, we describe an oscillatory retinal neuron (ORN) that directly converts incident DC light into voltage spikes. Coupled arrays of this device result in an imager that carries out in-sensor processing while capturing an image. Uniquely, the conversion from input light to output voltage spikes occurs without external power. When coupled in arrays, the neighboring neurons interact with each other to influence the spiking frequency spectrum. This allows the arrays to carry out frequency multiplexed computation on an input image. It is shown that this approach can achieve>20,000 TOPS/W, multiple orders of magnitude greater than the current approaches. Theory and simulation is used to elucidate how coupled ORNs carry out computation on an input image. When coupled, each output frequency band encodes a unique computation on the input image. By tuning the coupling impedances and the frequency bands, user defined computations can be carried out on the input. We experimentally show that this can be carried out with a 3x3 array that demonstrates simultaneous edge detection, intensity filtering, image segmentation and other functions. This hardware is then used to demonstrate improvement in MNIST handwritten digit classification accuracy over a traditional imager connected to a fully connected network.Next, we describe spiking synaptic devices that can be directly fabricated in the back-end of line of CMOS devices. These devices consist of an InP transistor channel with an engineered gate stack. Using a uniform gate insulator, we can demonstrate behaviors of biological synapses, such as potentiation, depression, spike number dependent plasticity, and spike timing dependent plasticity. By introducing a heterostructured gate insulator, it is shown that short-term to long-term memory transitions can be designed into the device. Finally, by using a transparent gate, an in-sensor synaptic phototransistor is demonstrated and the performance of these devices at a system level is demonstrated.
Read full abstract