Abstract

In this paper we address the problem of automatic pedestrian parsing in surveillance video with only a small number of training samples. Although human parsing has achieved great success with high-capacity models, it is still quite challenging to parse pedestrians in practical surveillance conditions because complicated environmental interferences need more pixel-level training samples to fit. But creating large datasets with pixel-level labels has been extremely costly due to the vast amount of human effort required. Our method is developed to capture the pedestrian information from the non-labeled datasets to update the trained model by reinforcement learning, which achieves elegant performance with only much fewer pixel-level labeled samples. Both quantitative and qualitative experiments conducted on practical surveillance datasets have shown the effectiveness of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call