Abstract

Edge Intelligence (EI) offers an attractive approach for local AI processing at the network edge for privacy protection and reduced transmission, but deploying resource-intensive neural networks on edge devices remains a challenge. The neural architecture search (NAS) technique, known for its automation and minimal manual intervention, serves as a pivotal tool for EI. However, existing methods typically concentrate on optimizing resource consumption for specific hardware, leading to hardware-specific neural architectures with limited generalizability. In response, we propose OnceNAS, a novel method that designs and optimizes on-device inference neural networks for resource-constrained edge devices. OnceNAS simultaneously optimizes for parameter count and inference latency in addition to inference accuracy, producing lightweight neural networks while maintaining their inference performance. Meanwhile, we introduce an efficient evaluation strategy that can simultaneously assess multiple metrics. Experimental results demonstrate the effectiveness of OnceNAS, achieving high-performing architectures with substantial size reduction (10.49x) and speedup (5.45x). As a result, OnceNAS offers practical value by generating efficient on-device inference neural architectures for resource-constrained edge devices, facilitating real-world applications like autonomous driving and smart healthcare. Furthermore, we contribute DARTS-Bench, an open-source dataset providing candidate architectures with hardware-related information and a user-friendly API, facilitating future research in lightweight NAS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call