Abstract

Existing video object segmentation (VOS) methods based on matching techniques commonly employ a reference set comprising historical segmented frames, referred to as ‘memory frames’, to facilitate the segmentation process. However, these methods suffer from the following limitations: (i) Inherent segmentation errors in memory frames can propagate and accumulate errors when utilized as templates for subsequent segmentation. (ii) The non-local matching technique employed in top-leading solutions often fails to incorporate positional information, potentially leading to incorrect matching. In this paper, we introduce the Modulated Memory Network (MMN) for VOS. Our MMN enhances matching-based VOS methods in the following ways: (i) Introducing an Importance Modulator, which adjusts memory frames using adaptive weight maps generated based on the segmentation confidence associated with each frame. (ii) Incorporating a Position Modulator that encodes spatial and temporal positional information for both memory frames and the current frame. The proposed modulator improves matching accuracy by embedding positional information. Meanwhile, the Importance Modulator mitigates error propagation and accumulation by incorporating confidence-based modulation. Through extensive experimentation, we demonstrate the effectiveness of our proposed MMN, which also achieves promising performance on VOS benchmarks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.