The current popular video object segmentation algorithms based on memory network indiscriminately update the frame information to the memory pool, fails to make reasonable use of the historical frame information, causing frame information redundancy in the memory pool, resulting in the increase of the computation amount. At the same time, the mask refinement method is relatively rough, resulting in blurred edges of the generated mask. To solve these problems, This paper proposes a video object segmentation algorithm based on dynamic perception update and feature fusion. In order to reasonably utilize the historical frame information, a dynamic perception update module is proposed to selectively update the segmentation frame mask. Meanwhile, a mask refinement module is established to enhance the detail information of the shallow features of the backbone network. This module uses a double kernels fusion block to fuse the different scale information of the features, and finally uses the Laplacian operator to sharpen the edges of the mask. The experimental results show that on the public datasets DAVIS2016, DAVIS2017 and YouTube-VOS18, the comprehensive performance of the algorithm in this paper reaches 86.9%, 79.3% and 71.6%, respectively, and the segmentation speed reaches 15FPS on the DAVIS2016 dataset. Compared with many mainstream algorithms in recent years, it has obvious advantages in performance.
Read full abstract