Abstract

Recent research shows that high-resolution networks can provide representative multi-scale features for vision tasks. However, high-resolution-based architectures are weak in capturing spatial long-range information, and they are computationally expensive. To alleviate these problems and encourage the engineering applications of dense prediction, based on vanilla high-resolution network, this paper presents a novel efficient architecture for dense prediction, namely Dynamic context modeling based lightweight High-Resolution Network (Dynamic-HRNet). In particular, the network consists of two key components: (i) dynamic split convolution, a more efficient and flexible convolution compared to the standard convolution, which could be conveniently embedded into other networks, and (ii) adaptive context modeling, which adaptively captures local and global contextual information to enrich the representation in parallel. Using the above two components, two lightweight blocks are designed that serve as the basic building units of Dynamic-HRNet, termed dynamic multi-scale context block and dynamic global context block, respectively. Extensive experiments demonstrate that the proposed Dynamic-HRNet, as a backbone, achieves significant superiority in popular benchmarks for human pose estimation (70.6% AP on COCO dataset and 87.6% PCKh on MPII dataset) and semantic segmentation (75.3% MIoU on Cityscapes dataset). Considering both efficiency and accuracy, it outperforms the most advanced lightweight architectures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call