Abstract

Semantic image segmentation is one of the most challenging tasks in computer vision. In this paper, we propose a highly fused convolutional network, which consists of three parts: downsampling, fused upsampling and multiple predictions. We adopt a VGG-net based downsampling structure, followed by multiple steps of upsampling. Feature maps in each pair of corresponding pooling layers and unpooling layers are combined. We also bring out multiple pre-outputs, each is generated from an unpooling layer by a one-step upsampling operation. Finally, we concatenate these pre-outputs to get the final output. As a result, our proposed network makes high use of the feature information by fusing and reusing features in low layers. In addition, when training our model, we add multiple soft cost functions on pre-outputs and the final output. In this way, we can reduce the loss reduction in backpropagation. We evaluate our model on three public segmentation datasets: CamVid, PASCAL VOC, and ADE20K. We achieve considerable segmentation performance on PASCAL VOC dataset and ADE20K dataset. Especially on CamVid dataset, we achieve state-of-the-art performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call