Parallelizing CNN inference on heterogeneous edge clusters with data parallelism has gained popularity as a way to meet real-time requirements without sacrificing model accuracy. However, existing algorithms struggle to find optimal parallel granularity for complex CNNS, the structure of which is a directed acyclic graph (DAG) rather than a chain, and the parallel dimension is inflexible. To distribute the workload of modern CNNs on heterogeneous devices is also proven as NP-hard problem. In this paper, we introduce DeepZoning , a versatile and cooperative inference framework that combines both model and data parallelism to accelerate CNN inference. DeepZoning employs two algorithms at different levels: (1) a low-level Adaptive Workload Partition algorithm that uses linear programming and takes spatial and channel dimensions into optimization during the search for feature map distribution on heterogeneous devices, and (2) a high-level Model Partition algorithm that finds the optimal model granularity and organizes complex CNNs into sequential zones to balance communication and computation during execution. Our experimental evaluations show that DeepZoning is effective, achieving up to a 3.02 × speed improvement on our experimental prototype compared to state-of-the-art algorithms.