Abstract
AbstractTraffic detection (including lane detection and traffic sign detection) is one of the key technologies to realize driving assistance system and auto drive system. However, most of the existing detection methods are designed based on single‐modal visible light data, when there are dramatic changes in lighting in the scene (such as insufficient lighting in night), it is difficult for these methods to obtain good detection results. In view of multi‐modal data can provide complementary discriminative information, based on the YoLoV5 model, this paper proposes a multi‐modal fusion YoLoV5 network, which consists of three key components: the dual stream feature extraction module, the correlation feature extraction module, and the self‐attention fusion module. Specifically, the dual stream feature extraction module is used to extract the features of each of the two modalities. Secondly, input the features learned from the dual stream feature extraction module into the correlation feature extraction module to learn the features with maximum correlation. Then, the extracted maximum correlation features are used to achieve information exchange between modalities through a self‐attention mechanism, and thus obtain fused features. Finally, the fused features are inputted into the detection layer to obtain the final detection result. Experimental results on different traffic detection tasks can demonstrate the superiority of the proposed method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.