Abstract

Semantic segmentation has been used successfully as a complementary information source in pedestrian detection. However, it requires accurate pixel-level semantic segmentation annotations for training, but it is extremely time-consuming to obtain these. In this work, we solve this problem by using weak segmentation masks automatically generated by depth images. This enables joint semantic segmentation and pedestrian detection with only ground truth bounding boxes for training. We show that this joint training boosts the performance of the pedestrian detector. Moreover, we show that fusing the outputs of the classification network and the generated segmentation masks leads to a further detection performance improvement. Extensive experiments have been conducted on three RGBD pedestrian datasets to demonstrate the effectiveness of our proposed method. As a byproduct, we also obtain pedestrian segmentation results of good quality, without using pixel-level segmentation annotations during training.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call