Abstract
Partitioning a CNN and parallel executing inference with multiple IoT devices have gained popularity as a way to meet real-time requirements without sacrificing model accuracy. However, existing algorithms have struggled to find the optimal model partitioning granularity for complex CNNs. Additionally, executing inference with heterogeneous IoT devices is NP-hard when the structure of the CNN is a directed acyclic graph (DAG) rather than a chain. In this paper, we introduce a versatile and cooperative inference framework that combines both model and data parallelism to accelerate CNN inference. DeepZoning employs two algorithms at different levels: (1) a low-level Adaptive Workload Partition algorithm that uses linear programming and takes spatial and channel dimensions into optimization during the search for feature map distribution on heterogeneous devices, and (2) a high-level Model Partition algorithm that finds the optimal model granularity and organizes complex CNNs into sequential zones to balance communication and computation during execution.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.