Abstract
The presence of limited spatio-temporal resolution in dynamic scenes renders segmentation of foreground objects problematic, as it brings negative effects on candidate object missing or motion boundary overfilling caused by large displacements of corresponding points in consecutive frames. To alleviate these problems, our general framework introduces a novel agnostic attribute video object segmentation method that is suitable for segmenting foreground objects in dynamic scenes at low spatio-temporal resolution. We employ a fully connected network (FCN) to facilitate estimation of class-agnostic object proposals based on the semantic classification attributes. Instead of directly deriving a hard classification into objects, we propose a scheme by fusing different top ranked soft scores in the semantic space that allows the model to directly estimate probabilistic foreground hypotheses. A unified conditional random field model is proposed to incorporate the proposal information derived from the soft prediction scores and consequently build up an unary energy functional with additional location and appearance potentials. The pairwise energy functional imposes both spatial and temporal consistency constraints simultaneously on appearance, location and unary potentials. Our experiments on spatio-temporal subsampled video segmentation benchmarks demonstrate the effectiveness of the proposed method for robust segmentation of class-agnostic objects in dynamic scenes despite of abrupt motion and large displacements caused by limited spatio-temporal resolution.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have