Multi-Agent Cooperative Camera-Based Semantic Grid Generation

Antoine Caillot,Rémi Boutteau,Yohan Dupuis,Pascal Vasseur,Safa Ouerghi

doi:10.1007/s10846-024-02093-4

Abstract

The idea of cooperative perception for navigation assistance was introduced about a decade ago with the aim to increase safety on dangerous areas like intersections. In this context, roadside infrastructure appeared very recently to provide a new point of view of the scene. In this paper, we propose to combine the Vehicle-To-Vehicle (V2V) and Vehicle-To-Infrastructure (V2I) approaches in order to take advantage of the elevated points of view offered by the infrastructure and the in-scene points of view offered by the vehicles to build a semantic grid map of the moving elements in the scene. To create this map, we chose to use camera information and 2-Dimentional (2D) bounding boxes in order to minimize the impact on the network and ignored possible depth information as opposed to all state-of-the art methods. We propose a framework based on two fusion methods: one based on the Bayesian theory and the other on the Dempster-Shafer Theory (DST) to merge the information and chose a label for each cell of the semantic grid in order to assess the best fusion method. Finally, we evaluate our approach on a set of datasets that we generated from the CARLA simulator varying the proportion of Connected Vehicle (CV) and the traffic density. We also show the superiority of the method based on the DST with a gain on the mean intersection over union between the two methods of up to 23.35%.

Full Text