Abstract
Pedestrian detection is a particular case of object detection that helps to reduce accidents in advanced driver-assistance systems and autonomous vehicles. It is not an easy task because of the variability of the objects and the time constraints. A performance comparison of object detection methods, including both GPU and non-GPU implementations over a variety of on-road specific databases, is provided. Computer vision multi-class object detection can be integrated on sensor fusion modules where recall is preferred over precision. For this reason, ad hoc training with a single class for pedestrians has been performed and we achieved a significant increase in recall. Experiments have been carried out on several architectures and a special effort has been devoted to achieve a feasible computational time for a real-time system. Finally, an analysis of the input image size allows to fine-tune the model and get better results with practical costs.
Highlights
Object detection is a central problem in Computer Vision
Pedestrian detection constitutes one of the most challenging tasks to perform in terms of on-road object detection for two main reasons
Generic-class databases, such as COCO,[20] are the starting point to develop general detection algorithms, we focused our effort on specific on-road and human image databases
Summary
Object detection is a central problem in Computer Vision. Its goal is to detect the location and class of each object in images or image sequences. Two-stage algorithms predict detections in two phases: they use spatial features at pixel level to extract some Regions of Interest, and use a second phase to classify all the proposals to decide if each region is a pedestrian or not These methods usually produce better detection results, but they are more computationally expensive,[16] being less used in real-time (RT) detection tasks because of the limited computational power of most of the resource-constrained devices usually installed on-board. NuScenes is a very novel, public large-scale dataset for autonomous driving It includes data from the full sensor suite of a self-driving car (RADAR, LiDAR, cameras, IMU and GPS), with more than 1.4 million camera images, and it provides manually labelled annotations for 23 classes, including VRUs. the databases used are joined to obtain a complete dataset that tries to represent as much variability as possible including different image sizes and aspect ratios, weather conditions, cities and roads, and a wide range of light conditions (see Figure 1). This allows the architecture to run in RT even on non-GPU resourceconstrained systems
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have