Abstract
In this paper, two novel and practical regularizing methods are proposed to improve existing neural network architectures for monocular optical flow estimation. The proposed methods aim to alleviate deficiencies of current methods, such as flow leakage across objects and motion consistency within rigid objects, by exploiting contextual information. More specifically, the first regularization method utilizes semantic information during the training process to explicitly regularize the produced optical flow field. The novelty of this method lies in the use of semantic segmentation masks to teach the network to implicitly identify the semantic edges of an object and better reason on the local motion flow. A novel loss function is introduced that takes into account the objects’ boundaries as derived from the semantic segmentation mask to selectively penalize motion inconsistency within an object. The method is architecture agnostic and can be integrated into any neural network without modifying or adding complexity at inference. The second regularization method adds spatial awareness to the input data of the network in order to improve training stability and efficiency. The coordinates of each pixel are used as an additional feature, breaking the invariance properties of the neural network architecture. The additional features are shown to implicitly regularize the optical flow estimation enforcing a consistent flow, while improving both the performance and the convergence time. Finally, the combination of both regularization methods further improves the performance of existing cutting edge architectures in a complementary way, both quantitatively and qualitatively, on popular flow estimation benchmark datasets.
Highlights
Optical flow estimation is a prerequisite step in a variety of computer vision problems, ranging from obvious ones, such as object tracking [1], action recognition [2], motion analysis [3], and video stabilization [4], to more sophisticated ones, such as monocular depth estimation [5], multi-frame super resolution [6], and 3D object reconstruction in immersive environments [7]
Regularization can be defined as any strategy employed to improve the training procedure of a neural network by imposing problem-specific restrictions
We show that semantically richer information can be used to improve flow estimation accuracy and motion consistency
Summary
Optical flow estimation is a prerequisite step in a variety of computer vision problems, ranging from obvious ones, such as object tracking [1], action recognition [2], motion analysis [3], and video stabilization [4], to more sophisticated ones, such as monocular depth estimation [5], multi-frame super resolution [6], and 3D object reconstruction in immersive environments [7]. The computation of optical flow is, an ill-posed problem in its naive formulation, as there are multiple valid solutions. Training a Deep Neural Network (DNN) network for a complex problem is generally a cumbersome procedure, highly sensitive to the training set and the parameters used. Regularization can be defined as any strategy employed to improve the training procedure of a neural network by imposing problem-specific restrictions. It has been shown that regularization can improve the Sensors 2020, 20, 3855; doi:10.3390/s20143855 www.mdpi.com/journal/sensors
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.