Collaborative intelligence (CI) is an approach used to accelerate deep neural network computations on IoT edge devices by offloading certain DNN computations to high-performance cloud servers. This offloading process involves transferring a large volume of feature data between the IoT edge device and the cloud servers, resulting in significant transfer time overhead that contributes to the overall end-to-end latency. Despite its importance, CI research has largely overlooked the impact of an unstable network environment on transfer time, leading to suboptimal latency in real-world scenarios. In light of this, we propose CO-PILOT, a novel partition framework that aims to minimize end-to-end latency while preserving the original model accuracy. CO-PILOT achieves this by finely controlling various lossy mechanisms, such as compression level and transport protocol, and strategically selecting error-robust layers as partition points to enhance resilience against network conditions and lossy mechanisms. The results demonstrate that CO-PILOT can reduce end-to-end latency by up to 85% with less than 1% accuracy degradation, all without requiring model retraining.
Read full abstract