Safe interaction with pedestrians is important for autonomous vehicles in urban environments. Comprehensive understanding of pedestrian behavior is a challenge for autonomous driving, as different pedestrian attributes have different characteristics and there is not yet a dataset providing all attribute labels. In this paper, we design an efficient multi-task model for pedestrian detection, tracking and attribute recognition to fully mine information in multiple tasks and multiple datasets to reduce redundant image processing computation. Multiple training datasets are utilized to take full advantage of the available partially-labeled data. The model consists of two stages. The first stage is to detect and track pedestrians, using only low-resolution images to save computational resources. Accurate pedestrian detection is achieved in the bird's-eye-view (BEV), while pedestrian tracking is implemented on the basis of 3D bounding boxes. The second stage is image-based multi-attribute detection. This stage uses the detection and tracking results and some pre-processed information from the first stage along with high-resolution images. Multi-attributes of the pedestrian are identified, such as pedestrian's pose, age, gender, and motion direction. With a carefully-designed multi-task system that shares necessary feature maps among multiple tasks and reuses computation results, we can solve pedestrian detection, tracking, and multi-attribute recognition problems in an efficient framework. This leads to more accurate decisions and more efficient computations. The innovation of the model is to accomplish a simultaneous detection of multiple pedestrian attributes for autonomous driving, while fully sharing the features in the model to avoid repeated computation, as well as utilizing multiple partially-labeled datasets for the pedestrian detection, tracking and attribute recognition model training.
Read full abstract