Abstract

Traffic Sign Recognition (TSR) has made significant progress in recent years, and both convolution neural network (CNN)-based and transformer-based models have been widely explored. In addition, combining CNN and transformer can effectively utilize both local and global information for judgment. However, this approach is still affected by the secondary complexity of transformer and cannot maximize the performance. Recently, a state space model (SSM)-based architecture called Mamba has been proposed, which excels in long-range modelling while maintaining linear complexity. When we directly use the Mamba architecture for TSR, it performs poorly because the local features cannot be fully utilized, which are crucial for recognizing traffic sign details. In this paper, we explore the potential of this SSM-based model in TSR from both efficiency and effectiveness perspectives, and we customize a MambaTSR architecture with ∼90k parameters and ∼1.4 ms processing time. Specifically, we use patch embedding and a four-stage encoder at the macro level, while at the micro level we employ three-stream adaptive mining embedding (TAME) to obtain local information and four efficient visual state space (EVSS) blocks to explore global associations. Experiments on German, China, and India datasets show that our method achieves optimal performance and reduces parameters by ∼89 % and processing time by ∼58 % compared to the state-of-the-art method. The code can be accessed at https://github.com/1024AILab/MambaTSR.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.