Abstract
There hardly exists any large-scale datasets with dense optical flow of non-rigid motion from real-world imagery as of today. The reason lies mainly in the required setup to derive ground truth optical flows: a series of images with known camera poses along its trajectory, and an accurate 3D model from a textured scene. Human annotation is not only too tedious for large databases, it can simply hardly contribute to accurate optical flow. To circumvent the need for manual annotation, we propose a framework to automatically generate optical flow from real-world videos. The method extracts and matches objects from video frames to compute initial constraints, and applies a deformation over the objects of interest to obtain dense optical flow fields. We propose several ways to augment the optical flow variations. Extensive experimental results show that training on our automatically generated optical flow outperforms methods that are trained on rigid synthetic data using FlowNet-S, LiteFlowNet, PWC-Net, and RAFT. Datasets and implementation of our optical flow generation framework are released at https://github.com/lhoangan/arap_flow.
Highlights
Optical flow estimation has gained significant progress with the emergence of convolutional neural networks (CNN)
While being the largest optical flow dataset available today containing real world images, the KITTI datasets contain only 200 pairs of frames with sparse flow fields, which is insufficient for supervised training of optical flow prediction CNNs
From these extensive yet initial analyses of various design choices, we derive that the generated datasets with non-rigid optical flow fields are well suited for training CNNs for optical flow prediction
Summary
Optical flow estimation has gained significant progress with the emergence of convolutional neural networks (CNN). While being the largest optical flow dataset available today containing real world images, the KITTI datasets contain only 200 pairs of frames with sparse flow fields, which is insufficient for supervised training of optical flow prediction CNNs. To resolve the data demand of CNNs, synthetic (generated) data is often used. A well-known synthetic dataset of optical flow is MPISintel (Butler et al, 2012), which uses images and annotations rendered from a computer-generated movie called Sintel This dataset contains non-rigid optical flow and serves as a well-established basis for comparing CNNs. for fully supervised training of CNNs, the number of frames in MPI-Sintel (around 2K) is still insufficient. Our method generates large amounts of optical flow data consisting of natural textures and non-rigid motions, which can be used for training CNNs designed for optical flow estimations.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have