Temporal Pixel-Level Semantic Understanding Through the VSPW Dataset.

Jiaxu Miao,Yunchao Wei,Xiaohan Wang,Yi Yang

doi:10.1109/tpami.2023.3266023

Abstract

Scene understanding through pixel-level semantic parsing is one of the main problems in computer vision. Till now, image-based methods and datasets for scene parsing have been well explored. However, the real world is naturally dynamic instead of a static state. Thus, learning to perform video scene parsing is more practical for real-world applications. Considering that few datasets cover an extensive range of scenes and object categories with temporal pixel-level annotations, in this work, we present a large-scale video scene parsing dataset, namely VSPW (Video Scene Parsing in the Wild). To be specific, there are a total of 251,633 frames from 3,536 videos with densely pixel-wise annotations in VSPW, including a large variety of 231 scenes and 124 object categories. Besides, VSPW is densely annotated with a high frame rate of 15 f/s, and over 96% of videos from VSPW have high spatial resolutions from 720P to 4 K. To the best of our knowledge, VSPW is the first attempt to address the challenging video scene parsing task in the wild by considering diverse scenes. Based on our VSPW, we further propose Temporal Attention Blending (TAB) Networks to harness temporal context information for better pixel-level semantic understanding of videos. Extensive experiments on VSPW well demonstrate the superiority of the proposed TAB over other baseline approaches. We hope the new proposed dataset and the explorations in this work can help advance the challenging yet practical video scene parsing task in the future. Both the dataset and the code are available at www.vspwdataset.com.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Temporal Pixel-Level Semantic Understanding Through the VSPW Dataset.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Sep 1, 2023
Citations: 5

Similar Papers

Scene Parsing through ADE20K Dataset
Bolei Zhou ... Xavier Puig
-
Bolei Zhou, et. al.Bolei Zhou ... Xavier Puig
01 Jul 2017
01 Jul 2017

Semantic Understanding of Scenes Through the ADE20K Dataset
Bolei Zhou ... Xavier Puig
International Journal of Computer Vision | VOL. 127
Bolei Zhou, et. al.Bolei Zhou ... Xavier Puig
07 Dec 2018
International Journal of Computer Vision | VOL. 127

Discriminative vs. generative object recognition : objects, faces, and the web

-

01 Jan 2007
01 Jan 2007

Efficient Submodular Function Minimization for Computer Vision
Pushmeet Kohli
-
Pushmeet KohliPushmeet Kohli
06 Feb 2014
06 Feb 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Temporal Pixel-Level Semantic Understanding Through the VSPW Dataset.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence