Abstract

Deep feature fusion plays a significant role in the strong learning ability of convolutional neural networks (CNNs) for computer vision tasks. Recently, works continually demonstrate the advantages of efficient aggregation strategy and some of them refer to multiscale representations. In this article, we describe a novel network architecture for high-level computer vision tasks where densely connected feature fusion provides multiscale representations for the residual network. We term our method the ResDNet which is a simple and efficient backbone made up of sequential ResDNet modules containing the variants of dense blocks named sliding dense blocks (SDBs). Compared with DenseNet, ResDNet enhances the feature fusion and reduces the redundancy by shallower densely connected architectures. Experimental results on three classification benchmarks including CIFAR-10, CIFAR-100, and ImageNet demonstrate the effectiveness of ResDNet. ResDNet always outperforms DenseNet using much less computation on CIFAR-100. On ImageNet, ResDNet-B-129 achieves 1.94% and 0.89% top-1 accuracy improvement over ResNet-50 and DenseNet-201 with similar complexity. Besides, ResDNet with more than 1000 layers achieves remarkable accuracy on CIFAR compared with other state-of-the-art results. Based on MMdetection implementation of RetinaNet, ResDNet-B-129 improves mAP from 36.3 to 39.5 compared with ResNet-50 on COCO dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.