Object detection is an essential computer vision task that identifies and locates objects within images or videos and is crucial for applications such as autonomous driving, robotics, and augmented reality. Light Detection and Ranging (LiDAR) and camera sensors are widely used for reliable object detection. These sensors produce heterogeneous data due to differences in data format, spatial resolution, and environmental responsiveness. Existing review articles on object detection predominantly focus on the statistical analysis of fusion algorithms, often overlooking the complexities of aligning data from these distinct modalities, especially dynamic environment data alignment. This paper addresses the challenges of heterogeneous LiDAR-camera alignment in dynamic environments by surveying over 20 alignment methods for three-dimensional (3D) object detection, focusing on research published between 2019 and 2024. This study introduces the core concepts of multimodal 3D object detection, emphasizing the importance of integrating data from different sensor modalities for accurate object recognition in dynamic environments. The survey then delves into a detailed comparison of recent heterogeneous alignment methods, analyzing critical approaches found in the literature, and identifying their strengths and limitations. A classification of methods for aligning heterogeneous data in 3D object detection is presented. This paper also highlights the critical challenges in aligning multimodal data, including dynamic environments, sensor fusion, scalability, and real-time processing. These limitations are thoroughly discussed, and potential future research directions are proposed to address current gaps and advance the state-of-the-art. By summarizing the latest advancements and highlighting open challenges, this survey aims to stimulate further research and innovation in heterogeneous alignment methods for multimodal 3D object detection, thereby pushing the boundaries of what is currently achievable in this rapidly evolving domain.
Read full abstract