Abstract
Semantic segmentation is a fundamental task in computer vision and is widely used in industry. However, current state-of-the-art architectures usually bring heavy computation complexity, making it hard to meet the demand for real-time, and can not be implemented in industry. In this paper, we propose a lightweight network to complete fast segmentation. Our network follows encoder-decoder style, which encodes rich spatial information at shallow layers and gains sufficient semantic information at deep layers. At the decoder part, we use attention mechanism to re-weight features and gradually fuse high-level features back to low-level features. We evaluate our network on Cityscapes dataset. Our method achieves an accuracy of 68.0 % mean intersection over union, and runs at 50.7 frames per second at full resolution (1024x2048) on one NVIDIA GeForce GTX 1080Ti card.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have