Abstract

At present, the backbone network structure of the classification model is mostly linear, although it can expand the receptive field to learn high-level semantic information with the increasing number of convolution layers. However, they did not fully consider the different levels of information representation. Therefore, we propose a twice-fused pyramid convolution neural network. First, according to the features of different convolution layers, we divided the network into three levels: shallow, medium, and deep. Then, through the reverse pyramid connection, the features are secondary learned under the guidance of high-level semantics. At the same time, to bridge the feature differences of direct fusion at different feature levels, we use the method of twice fusion to connect the pyramid layers. We know that multi-scale context information is important for semantic segmentation, so we add an augmentation module between the pyramid feature layers that is denser than ASPP feature extraction. In experiments, our model also uses depthwise separable convolutions, which outperforms the linear structure MobileNetV1 in classification results on CIFAR and SVHN datasets while maintaining light weight. At the same time, it also achieved good results in semantic segmentation tasks, surpassing some mainstream models on Cityscapes and CamVid datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call