Abstract

There has recently been growing interest in utilizing multimodal sensors to achieve robust lane line segmentation. In this paper, we introduce a novel multimodal fusion architecture from an information theory perspective, and demonstrate its practical utility using Light Detection and Ranging (LiDAR) camera fusion networks. In particular, we develop a multimodal fusion network as a joint coding model, where each single node, layer, and pipeline is represented as a communication channel, and forward propagation equates to information transmission in the channel. This enables us to qualitatively and quantitatively analyze the effect of different fusion approaches. We argue that an optimal fusion architecture is related to essential capacity and allocation based on the source and channel models. To test this multimodal fusion hypothesis, we progressively formulate a series of multimodal models based on our proposed fusion methods and evaluate them on benchmark KITTI and the A2D2 datasets. Our optimal fusion network achieves 85%+ lane line accuracy and 98.7%+ overall accuracy. We conclude that the performance gap with state-of-the-art models demonstrates the potential of our fusion architecture to serve as a new benchmark resource for the multimodal deep learning community.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.