Abstract

Different approaches were proposed to design deep CNNs for semantic segmentation. Usually, they are built upon an encoder–decoder architecture and require computationally expensive operations on high-resolution activation maps. Since for real-time segmentation the costs are critical, efficient approaches compromise spatial information to achieve real-time segmentation but with a considerable drop in accuracy. We introduce a new module based on depthwise separable, shuffled and grouped convolutions that optimize up-sampling operations by using a sizeable receptive field and preserving spatial information. Then, we designed an efficient network based on dense connectivity to achieve a remarkable trade-off accuracy and speed. We show through set of experiments that even by up-sampling with a lightweight decoder, our applied architecture scores on Cityscape 69.5% Mean IoU with $$1024\times 512$$ inputs and 95.2 FPS on the test set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call