Video Instance-Level Human Parsing

Liang Lin,Ping Luo,Dongyu Zhang,Wangmeng Zuo

doi:10.1007/978-981-13-2387-4_7

Abstract

This chapter introduces a novel Adaptive Temporal Encoding Network (ATEN) that alternatively performs temporal encoding among key frames and flow-guided feature propagation from other consecutive frames between two key frames. Specifically, ATEN first incorporates a Parsin-RCNN to produce the instance-level parsing result for each key frame, which integrates global human parsing and instance-level human segmentation into a unified model. To balance accuracy and efficiency, flow-guided feature propagation is used to directly parse consecutive frames according to their identified temporal consistency with key frames. On the other hand, ATEN leverages the convolutional gated recurrent units (convGRU) to exploit temporal changes over a series of key frames, which are further used to facilitate frame-level instance-level parsing. By alternatively performing direct feature propagation between consistent frames and temporal encoding networks among key frames, our ATEN achieves a good balance between frame-level accuracy and time efficiency, which is a common crucial problem in video object segmentation research.

Full Text