Semantic Segmentation in Compressed Videos

Ang Li,Yang Wang,Yiwei Lu

doi:10.1109/mmsp.2019.8901826

Abstract

Existing approaches for semantic segmentation in videos usually extract each frame as an RGB image, then apply standard image-based semantic segmentation models on each frame. This is time-consuming. In this paper, we tackle this problem by exploring the nature of video compression techniques. A compressed video contains three types of frames, I-frames, P-frames, and B-frames. I-frames are represented as regular images, P-frames are represented as motion vectors and residual errors, and B-frames are bidirectionally frames that can be regarded as a special case of a P frame. We propose a method that directly operates on I-frames (as RGB images) and P-frames (motion vectors and residual errors) in a video. Our proposed model uses a ConvLSTM model to capture the temporal information in the video required for producing the semantic segmentation on P-frames. Our experimental results show that our method performs much faster than other alternatives while achieveing similar performance in terms of accuracies.

Full Text