Learning Instance Motion Segmentation With Geometric Embedding

Zhen Leng,Jing Chen,Songnan Lin

doi:10.1109/access.2021.3062673

Zhen Leng, Jing Chen + Show 1 more

Open Access

https://doi.org/10.1109/access.2021.3062673

Copy DOI

Abstract

Most existing deep learning-based motion segmentation methods treat motion segmentation as a binary segmentation problem, which is generally not the real case in dynamic scenes. In addition, the object and camera motion are often mixed, making the motion segmentation problem difficult. This paper proposes a joint learning method which fuses semantic features and motion clues using CNNs with deformable convolution and a motion embedding module, to address multi-object motion segmentation problem. The deformable convolution module serves to fusion color and motion information. And the motion embedding module learns to distinguish objects' motion status with inspiration from geometric modeling methods. We perform extensive quantitative and qualitative experiments on benchmark datasets. Especially, we label over 9000 images of KITTI visual odometry dataset to help training the deformable module. Our method achieves superior performance in comparison to the current state-of-the-art in terms of speed and accuracy.

Highlights

Motion segmentation, a key challenge in computer vision, aims at partitioning an image into regions of homogenous motion on the pixel level in moving camera videos
Inspired by the geometry studies in traditional motion segmentation methods, we propose a novel network module that performs model selection and motion representation in a single step, in which the motion representations are encoded as fixed-length embeddings containing geometric motion information from the optical flows
The evaluation metrics we use are F-score and intersection over union (IOU) which are common in motion segmentation

Summary

Introduction

A key challenge in computer vision, aims at partitioning an image into regions of homogenous motion on the pixel level in moving camera videos. Most state-of-the-art algorithms [5]–[9] geometrically model the motion of cameras, scenes, and objects and group the pixels according to the geometric motion model. Reference [11] cast the motion segmentation problem as point trajectories clustering and solve the problem with the multicut optimization. Another set of methods [12], [13] analyzes optical how between a pair of frames to group pixels with

Objectives

Methods

Results

Conclusion