Abstract

In this paper, we study the application of Vehicle-to-Everything (V2X) communication to improve the perception performance of autonomous vehicles. We present V2X-ViTs, a robust cooperative perception framework with V2X communication using novel vision Transformer models. First, we present V2X-ViTv1 containing holistic attention modules that can effectively fuse information across on-road agents (i.e., vehicles and infrastructure). Specifically, V2X-ViTv1 consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention, which captures inter-agent interaction and per-agent spatial relationships. These key modules are designed in a unified Transformer architecture to handle common V2X challenges, including asynchronous information sharing, pose errors, and heterogeneity of V2X components. Second, we propose an advanced architecture, V2X-ViTv2, that enjoys increased ability for multi-scale perception. We also propose advanced data augmentation techniques tailored for V2X applications to improve performance. We construct a large-scale V2X perception dataset using CARLA and OpenCDA to validate our approach. Extensive experimental results on both synthetic and real-world datasets show that V2X-ViTs achieve state-of-the-art performance for 3D object detection and are robust even under harsh, noisy environments. All the code and trained models will be available at https://github.com/DerrickXuNu/OpenCOOD.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.