Abstract
The width of convolutional neural networks (CNNs) is crucial for improving performance. Many wide CNNs use a convolutional layer to fuse multiscale features or fuse the preceding features to subsequent features. However, these CNNs rarely use blocks, which consist of a series of successive convolutional layers, to fuse multiscale features. In this paper, we propose an approach for improving performance by fusing the low-level features extracted from different blocks. We utilize five different convolutions, including 3×3, 5×5, 7×7,5×3 ∪ 3×5 and 7×3 ∪ 3×7, to generate five low-level features, and we design two fusion strategies: low-level feature fusion (L-Fusion) and high-level feature fusion (H-Fusion). Experimental results show that the L-Fusion is more helpful for improving the performance of CNNs, and the 5×5 convolution is more suitable for multiscale feature fusion. We summarize the conclusion as a strategy that fuses multiscale features in the preceding stage of CNNs. Furthermore, we propose a new architecture to perceive the input of CNNs by using two self-governed blocks based on the strategy. Finally, we modify five off-the-shelf networks, DenseNet-BC (depth = 40), ALL-CNN-C (depth = 9), Darknet 19 (depth = 19), Resnet 18 (depth = 18) and Resnet 50 (depth = 50), by utilizing the proposed architecture to verify the conclusion, and these updated networks provide more competitive results.
Highlights
CNNs [1] were first presented in 1989, and they have demonstrated excellent performance in many visual tasks such as semantic segmentation [2], [3], image classification [4], and object detection [5], [6]
One of our purposes of this paper is to study the advantage of multiscale feature fusion, but we seek to answer whether large-scale feature or multiscale feature fusion increases performance more
In this paper, we divide a convolutional neural networks (CNNs) into different blocks according to the size of the features to obtain low-level and high-level features for feature fusion
Summary
Ns (convolutional neural networks) [1] were first presented in 1989, and they have demonstrated excellent performance in many visual tasks such as semantic segmentation [2], [3], image classification [4], and object detection [5], [6]. As hardware has developed, the performance of CNNs has increased dramatically due to the higher computational capacity of the hardware. Some classic models have validated that the depth of a CNN is pivotal for its performance [11], [4]. Many visual recognition tasks have benefitted from very deep networks [12], [13]. A considerably deeper network achieves better results than a shallower network, and we can obtain a higher-quality model by increasing the depth.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.