CNN‐Transformer for visual‐tactile fusion applied in road recognition of autonomous vehicles

Runwu Shi,Shichun Yang,Yuyi Chen,Rui Wang,Mengyue Zhang,Jiayi Lu,Yaoguang Cao

doi:10.1016/j.patrec.2022.11.023

Abstract

Reliable autonomous driving requires comprehensive environment perception, among which the road recognition is critical for autonomous vehicles to achieve adaptability, reliability, and safety. Existing equipped sensors such as cameras, LiDAR, and accelerometers have been adopted widely for road recognition. However, single sensor based recognition methods present challenges in balancing high accuracy and adaptability. In this study, we propose a visual-tactile fusion road recognition system for autonomous vehicles. The visual modality is derived from the captured road images, and the tactile modality information comes from the designed intelligent tire system, containing a low-cost piezoelectric sensor. For accurate road recognition, we propose a multimodal fusion recognition network based on the CNN-transformer architecture. The visual and tactile modalities are fed into modality-specific SE-CNNs, which emphasize valuable input information to obtain weighted features. These features are subsequently input to "bottleneck" based fusion transformer encoders and output the recognition results. We design a fusion feature extractor to enhance the fusion representation capability and improve accuracy. The vehicle field experiments are conducted to build the dataset consisting of four road surfaces, and the results show that the proposed network achieves an accuracy of 99.48% in road recognition task.

Full Text