Abstract

The energy demanding nature of deep learning (DL) has fueled the immense attention for neuromorphic architectures due to their ability to operate in a very high frequencies in a very low energy consumption. To this end, neuromorphic photonics are among the most promising research directions, since they are able to achieve femtojoule per MAC efficiency. Although electrooptical substances provide a fast and efficient platform for DL, they also introduce various noise sources that impact the effective bit resolution, introducing new challenges to DL quantization. In this work, we propose a quantization-aware training method that gradually performs bit reduction to layers in a mixed-precision manner, enabling us to operate lower-precision networks during deployment and further increase the computational rate of the developed accelerators while keeping the energy consumption low. Exploiting the observation that intermediate layers have lower-precision requirements, we propose to gradually reduce layers’ bit resolutions, by normally distributing the reduction probability of each layer. We experimentally demonstrate the advantages of mixed-precision quantization in both performance and inference time. Furthermore, we experimentally evaluate the proposed method in different tasks, architectures, and photonic configurations, highlighting its immense capabilities to reduce the average bit resolution of DL models while significantly outperforming the evaluated baselines.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call