Abstract

Semantic segmentation is increasingly being applied on mobile devices due to advancements in mobile chipsets, particularly in low-power consumption scenarios. However, the lightweight design of mobile devices poses limitations on the receptive field, which is crucial for dense prediction problems. Existing approaches have attempted to balance lightweight designs and high accuracy by downsampling features in the backbone. However, this downsampling may result in the loss of local details at each network stage. To address this challenge, this paper presents a novel solution in the form of a compact and efficient convolutional neural network (CNN) for real-time applications: our proposed model, local spatial perception convolution (LSPConv). Furthermore, the effectiveness of our architecture is demonstrated on the Cityscapes dataset. The results show that our model achieves an impressive balance between accuracy and inference speed. Specifically, our LightSeg, which does not rely on ImageNet pretraining, achieves an mIoU of 76.1 at a speed of 61 FPS on the Cityscapes validation set, utilizing an RTX 2080Ti GPU with mixed precision. Additionally, it achieves a speed of 115.7 FPS on the Jetson NX with int8 precision.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.