Abstract

Visible–infrared urban road scene parsing is attracting increasing attention because it can extract complementary cues from the visible and infrared imaging modalities. However, most existing parsing methods adopt complicated models, which incur large computational costs and limit real-time performance. Moreover, parsing methods may inadequately explore and apply high-level semantic information, considerably undermining the parsing accuracy. To solve these problems, we introduce a lightweight high-performance network called cross-guided contextual perceptive network (CCPNet). A lightweight backbone equipped with adaptive refined fusion modules reduces the size of CCPNet. Additionally, a cross-guided contextual perceptive module extracts and enhances semantic cues from high-level features. Experimental results indicate that CCPNet achieves state-of-the-art performance for visible–infrared scene parsing with few parameters (7.34 million), a small model (29.9 MB), and real-time inference (50.03 fps). The CCPNet code and results are available at: https://github.com/Jinfu0913/CCPNet.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.