Abstract
Object detection using remote sensing data is a key task of the perception systems of self-driving vehicles. While many generic deep learning architectures have been proposed for this problem, there is little guidance on their suitability when using them in a particular scenario such as autonomous driving. In this work, we aim to assess the performance of existing 2D detection systems on a multi-class problem (vehicles, pedestrians, and cyclists) with images obtained from the on-board camera sensors of a car. We evaluate several one-stage (RetinaNet, FCOS, and YOLOv3) and two-stage (Faster R-CNN) deep learning meta-architectures under different image resolutions and feature extractors (ResNet, ResNeXt, Res2Net, DarkNet, and MobileNet). These models are trained using transfer learning and compared in terms of both precision and efficiency, with special attention to the real-time requirements of this context. For the experimental study, we use the Waymo Open Dataset, which is the largest existing benchmark. Despite the rising popularity of one-stage detectors, our findings show that two-stage detectors still provide the most robust performance. Faster R-CNN models outperform one-stage detectors in accuracy, being also more reliable in the detection of minority classes. Faster R-CNN Res2Net-101 achieves the best speed/accuracy tradeoff but needs lower resolution images to reach real-time speed. Furthermore, the anchor-free FCOS detector is a slightly faster alternative to RetinaNet, with similar precision and lower memory usage.
Highlights
The increase in availability and quality of remote sensing data provided by modern multi-modal sensors has allowed pushing the state-of-the-art in many computer vision tasks
We study the combination of onestage (RetinaNet, Fully Convolutional One-Stage Object Detector (FCOS), YOLOv3) and two-stage (Faster R-convolutional neural networks (CNNs)) meta-architectures with different feature extractors (ResNet-50, Residual Networks (ResNet)-101, ResNet-152, ResNeXt-101, Res2Net-101, DarkNet-53, MobileNet V1, MobileNet V2)
We present an experimental study comparing the performance of several deep learning-based object detection systems in the context of autonomous vehicles
Summary
The increase in availability and quality of remote sensing data provided by modern multi-modal sensors has allowed pushing the state-of-the-art in many computer vision tasks. The data provided by high-resolution cameras and proximity sensors have helped to develop more powerful machine learning models that have achieved unprecedented results in visual recognition problems [1] These developments have significantly improved the perception systems used in many applications such as autonomous driving [2,3], security surveillance [4], or land monitoring [5]. One of the essential tasks that an ADAS needs to address is object detection These remote sensing systems need to detect traffic targets in real time in order to make informed driving decisions. They have to be robust enough to operate effectively in complex scenarios such as adverse weather, poor lighting, or occluded objects.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.