ResNet with Global and Local Image Features, Stacked Pooling Block, for Semantic Segmentation

Hui Song,Zhuqing Jiang,Yun Zhou,Xiaoqiang Guo,Zixuan Yang

doi:10.1109/iccchina.2018.8641146

Abstract

Recently, deep convolutional neural networks (CNNs) have achieved great success in semantic segmentation systems. In this paper, we show how to improve pixel-wise semantic segmentation by combine both global context information and local image features. First, we implement a fusion layer that allows us to merge global features and local features in encoder network. Second, in decoder network, we introduce a stacked pooling block, which is able to significantly expand the receptive fields of features maps and is essential to contextualize local semantic predictions. Furthermore, our approach is based on ResNet18, which makes our model have much less parameters than current published models. The whole framework is trained in an end-to-end fashion without any post-processing. We show that our method improves the performance of semantic image segmentation on two datasets CamVid and Cityscapes, which demonstrate its effectiveness.

Full Text