Abstract

Recently, exploring features from different layers in fully convolutional networks (FCNs) has gained substantial attention to capture context information for semantic segmentation. This paper presents a novel encoder-decoder architecture, called contextual ensemble network (CENet), for semantic segmentation, where the contextual cues are aggregated via densely usampling the convolutional features of deep layer to the shallow deconvolutional layers. The proposed CENet is trained in terms of end-to-end segmentation to match the resolution of input image, and allows us to fully explore contextual features through ensemble of dense deconvolutions. We evaluate our CENet on two widely-used semantic segmentation datasets: PASCAL VOC 2012 and CityScapes. The experimental results demonstrate our CENet achieves superior performance with respect to recent state-of-the-art results. Furthermore, we also evaluate CENet on MS COCO dataset and ISBI 2012 dataset for the task of instance segmentation and biological segmentation, respectively. The experimental results show that CENet obtains promising results on these two datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call