Abstract

With the recent surge in autonomous driving vehicles, the need for accurate vehicle detection and tracking is critical now more than ever. Detecting vehicles from visual sensors fails in non-line of sight (NLOS) settings. This can be compensated by the inclusion of other modalities in an multi-domain sensing environment. In this paper, we propose different deep learning based frameworks for fusing different modalities (image, radar, acoustic, seismic) through the exploitation of complementary latent embeddings among different modalities, and incorporating different state-of-the-art fusion strategies. Our proposed fusion frameworks perform considerably better than unimodal detection. Moreover, fusion between image and non-image modalities helps to track vehicles, as well as detect them, when direct line of sight (LOS) is not available. We validate our models on the real-world multimodal ESCAPE dataset, showing 33.16% improvement in vehicle detection performance by fusion (over visual inference alone) while 30-42% of the test scenarios have vehicles in NLOS. To showcase the generalization aspect of the proposed frameworks over multiple datasets, we also validate our models on multimodal NuScene dataset, showing ~22% improvement than the competing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call