Abstract

Semantic segmentation is a fundamental task in computer vision and is widely used in industry. However, current state-of-the-art architectures usually bring heavy computation complexity, making it hard to meet the demand for real-time, and can not be implemented in industry. In this paper, we propose a lightweight network to complete fast segmentation. Our network follows encoder-decoder style, which encodes rich spatial information at shallow layers and gains sufficient semantic information at deep layers. At the decoder part, we use attention mechanism to re-weight features and gradually fuse high-level features back to low-level features. We evaluate our network on Cityscapes dataset. Our method achieves an accuracy of 68.0 % mean intersection over union, and runs at 50.7 frames per second at full resolution (1024x2048) on one NVIDIA GeForce GTX 1080Ti card.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call