Contrastive pooling segmentation network

Rajiv Mandya Nagaraju,Cory Merkel,Tarek M Taha,Michael E Zelinski

doi:10.1117/12.2594976

Abstract

Producing a better segmentation mask is crucial in scene understanding. Semantic Segmentation is a vital task for applications such as autonomous driving, robotics, medical image understanding. Efficient high and low-level context manipulation is a key for competent pixel-level classification. The image’s high-level feature map helps in the better spatial configuration of the objects for Segmentation, while the low-level features help to discern the boundaries of the objects in the segmentation map. In our implementation, We use a two bridged network. The first bridge manipulates the subtle differences between images and produces a vector to understand the low-level features in the input images. The second bridge produces global contextual aggregation from the image while gathering a better understanding of the image’s high-level features. The backbone network is the dialated residual network which helps to avoid the attrition of the size of the image during feature extraction. We train our network on the Cityscapes dataset and ADE20k dataset and compare our results with the State-of-the-Art models. The initial experiments have yielded an initial mean IoU of 70.1% and pixel accuracy of 94.4% on the cityscapes dataset and 34.6% on the ADE20K dataset.

Full Text