Deep Pixel-Level Matching via Attention for Video Co-Segmentation

Junliang Li,Shengfeng He,Wenxiao Wang,Sio-Long Lo,Guifang Zhang,Hon-Cheng Wong

doi:10.3390/app10061948

Abstract

In video object co-segmentation, methods based on patch-level matching are widely leveraged to extract the similarity between video frames. However, these methods can easily lead to pixel misclassification because they reduce the precision of pixel localization; thus, the accuracies of the segmentation results of these methods are deducted. To address this problem, we propose a framework based on deep neural networks and equipped with a new attention module, which is designed for pixel-level matching to segment the object across video frames in this paper. In this attention module, the pixel-level matching step is able to compare the feature value of each pixel from one input frame with that of each pixel from another input frame for computing the similarity between two frames. Then a features fusion step is applied to efficiently fuse the feature maps of each frame with the similarity information for generating dense attention features. Finally, an up-sampling step refines the feature maps for obtaining high quality segmentation results by using these dense attention features. The ObMiC and DAVIS 2016 datasets were utilized to train and test our framework. Experimental results show that our framework achieves higher accuracy than those of other video segmentation methods that perform well in common information extraction.

Highlights

Video object co-segmentation refers to the process of jointly segmenting the common objects from two or more video frames
(1) We develop a new attention module with pixel-level matching so that the similarity information between video frames can be effectively utilized for increasing the accuracy in video co-segmentation; (2) we build a deep learning framework that integrates the new attention module for extracting accurate features and generating reliable segmentation results
The pipeline of our attention module is shown in Figure 2. v a,1 and vb,1, respectively, denote the feature maps that are reshaped and inflated from f a and f b ; v a,2 and vb,2, respectively, denote the feature maps that are only inflated in channel-level from f a and f b

Summary

Introduction

Video object co-segmentation refers to the process of jointly segmenting the common objects from two or more video frames. The reshape and inflation operations are useful when computing the similarity between two frames since these operations are able to keep two feature maps respectively belonging to the two frames in the same size We apply these strategies to capture the common information between two frames’ feature maps in pixel-level. We design a new attention module with pixel-level matching for obtaining high quality similarity features between two video frames as well as generating accurate segmentation results. (1) We develop a new attention module with pixel-level matching so that the similarity information between video frames can be effectively utilized for increasing the accuracy in video co-segmentation; (2) we build a deep learning framework that integrates the new attention module for extracting accurate features and generating reliable segmentation results.

Related Work

The Pipeline of Our Framework

Background mask

Our Attention Module

Experiments

Ablation Study

Comparisons with the State-of-the-Art Methods

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied sciences	Publication Date: Mar 12, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Deep Pixel-Level Matching via Attention for Video Co-Segmentation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences

Lead the way for us

Similar Papers

Super-resolution reconstruction of binocular image based on multi-level fusion attention network
Xu Lei ... Liu Qingshan
Journal of Image and Graphics | VOL. 28
Xu Lei, et. al.Xu Lei ... Liu Qingshan
01 Jan 2023
Journal of Image and Graphics | VOL. 28

Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition.
Bo Chen ... Fangzhou Meng
Sensors (Basel, Switzerland) | VOL. 23
Bo Chen, et. al.Bo Chen ... Fangzhou Meng
03 Feb 2023
Sensors (Basel, Switzerland) | VOL. 23

One Spatio-Temporal Sharpening Attention Mechanism for Light-Weight YOLO Models Based on Sharpening Spatial Attention.
Mengfan Xue ... Yunfei Guo
Sensors (Basel, Switzerland) | VOL. 21
Mengfan Xue, et. al.Mengfan Xue ... Yunfei Guo
28 Nov 2021
Sensors (Basel, Switzerland) | VOL. 21

Divided attention
-
Electronics Letters | VOL. 55
--
01 Apr 2019
Electronics Letters | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Pixel-Level Matching via Attention for Video Co-Segmentation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences