Dynamic networks have become a pivotal area of study in deep learning due to their ability to selectively activate computing units (such as layers or channels) or dynamically allocate computation to information-rich regions. This capability significantly curtails unnecessary computations, adapting to varying inputs. Despite these advantages, the practical efficiency of dynamic models often falls short of theoretical computation. This discrepancy arises from three primary challenges: 1) a lack of a unified framework across different dynamic inference paradigms due to the fragmented research landscape; 2) an excessive focus on algorithm design at the expense of scheduling strategies, which are essential for optimizing resource utilization on hardware; and 3) the complexity of latency evaluation, since most current libraries cater to static operators. To tackle these issues, we introduce Latency-Aware Unified Dynamic Networks (LAUDNet), a general framework that integrates three fundamental dynamic paradigms-spatially-adaptive computation, layer skipping, and channel skipping-into a single unified formulation. LAUDNet not only refines algorithmic design but also enhances scheduling optimization with the aid of a latency predictor. This predictor efficiently and accurately predicts the inference latency of dynamic operators on specific hardware setups. Our empirical assessments across multiple vision tasks-image classification, object detection, and instance segmentation-confirm that LAUDNet significantly bridges the gap between theoretical and practical efficiency. For instance, LAUDNet cuts down the practical latency of its static counterpart, ResNet-101, by over 50% on hardware platforms like V100, RTX 3090, and TX2 GPUs. Additionally, LAUDNet excels in the accuracy-efficiency trade-off compared to other methods.
Read full abstract