Abstract

In this work, we investigate the effects of the cascade architecture of dilated convolutions and the deep network architecture of multi-resolution input images on the accuracy of semantic segmentation. We show that a cascade of dilated convolutions is not only able to efficiently capture larger context without increasing computational costs, but can also improve the localization performance. In addition, the deep network architecture for multi-resolution input images increases the accuracy of semantic segmentation by aggregating multi-scale contextual information. Furthermore, our fully convolutional neural network is coupled with a model of fully connected conditional random fields to further remove isolated false positives and improve the prediction along object boundaries. We present several experiments on two challenging image segmentation datasets, showing substantial improvements over strong baselines.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call