Abstract Deep learning (DL) powers numerous applications on ubiquitous edge devices, but its high resource demands pose a challenge. Approximate computing is often proposed to alleviate this, yet such calculation usually suffers from a fixed level of accuracy loss. We propose a novel control-theoretic approach for predictive DL inference on resource constrained devices. Our system dynamically adjusts approximation levels based on a trade-off between resource utilisation and accuracy, considering future demands. Extensive experiments across diverse domains - human activity recognition, acoustic scene profiling, and computer vision - with various neural network architectures and approximation techniques, demonstrate that our approach achieves up to 50% energy savings while maintaining the desired inference accuracy and incurring minimal runtime overhead. Furthermore, we showcase our method in a real-world deployment on low-power edge devices and confirm its superiority over current state-of-the-art solutions.
Read full abstract