Abstract

Most existing deep learning-based motion segmentation methods treat motion segmentation as a binary segmentation problem, which is generally not the real case in dynamic scenes. In addition, the object and camera motion are often mixed, making the motion segmentation problem difficult. This paper proposes a joint learning method which fuses semantic features and motion clues using CNNs with deformable convolution and a motion embedding module, to address multi-object motion segmentation problem. The deformable convolution module serves to fusion color and motion information. And the motion embedding module learns to distinguish objects' motion status with inspiration from geometric modeling methods. We perform extensive quantitative and qualitative experiments on benchmark datasets. Especially, we label over 9000 images of KITTI visual odometry dataset to help training the deformable module. Our method achieves superior performance in comparison to the current state-of-the-art in terms of speed and accuracy.

Highlights

  • Motion segmentation, a key challenge in computer vision, aims at partitioning an image into regions of homogenous motion on the pixel level in moving camera videos

  • Inspired by the geometry studies in traditional motion segmentation methods, we propose a novel network module that performs model selection and motion representation in a single step, in which the motion representations are encoded as fixed-length embeddings containing geometric motion information from the optical flows

  • The evaluation metrics we use are F-score and intersection over union (IOU) which are common in motion segmentation

Read more

Summary

Introduction

A key challenge in computer vision, aims at partitioning an image into regions of homogenous motion on the pixel level in moving camera videos. Most state-of-the-art algorithms [5]–[9] geometrically model the motion of cameras, scenes, and objects and group the pixels according to the geometric motion model. Reference [11] cast the motion segmentation problem as point trajectories clustering and solve the problem with the multicut optimization. Another set of methods [12], [13] analyzes optical how between a pair of frames to group pixels with

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call