Widening siamese architectures for stereo matching

Patrick Brandao,Evangelos Mazomenos,Danail Stoyanov

doi:10.1016/j.patrec.2018.12.002

Patrick Brandao, Evangelos Mazomenos + Show 1 more

Open Access

PDF Available

https://doi.org/10.1016/j.patrec.2018.12.002

Copy DOI

Export

Save

Cite

Journal: Pattern Recognition Letters	Publication Date: Dec 4, 2018
Citations: 23	License type: cc-by

Affiliation: University College London

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Computational stereo is one of the classical problems in computer vision. Numerous algorithms and solutions have been reported in recent years focusing on developing methods for computing similarity, aggregating it to obtain spatial support and finally optimizing an energy function to find the final disparity. In this paper, we focus on the feature extraction component of stereo matching architecture and we show standard CNNs operation can be used to improve the quality of the features used to find point correspondences. Furthermore, we use a simple space aggregation that hugely simplifies the correlation learning problem, allowing us to better evaluate the quality of the features extracted. Our results on benchmark data are compelling and show promising potential even without refining the solution.

Highlights

Computational stereo is one of the classical problems in computer vision systems whereby two cameras placed at different viewpoints can be used to extract 3D information by analyzing the relative position of the objects in the two perspectives of the scene
Since the first winning entry in the ImageNet Large Scale Visual Recognition Challenge, deep learning has been at the forefront of most computer vision breakthroughs [13]
Convolutional neural networks (CNNs) are widely used across different vision problems and in a vast range of applications, such as robotics and medical endoscopic imaging

Summary

Introduction

Computational stereo is one of the classical problems in computer vision systems whereby two cameras placed at different viewpoints can be used to extract 3D information by analyzing the relative position of the objects in the two perspectives of the scene. Finding relative displacements between image pairs from stereo cameras is usually called stereo matching [2,15]. By using the fundamental constrains in the two-view geometry of two perspective cameras, it is possible to reduce the stereo matching problem to a 1D search space in horizontally rectified images. Since the first winning entry in the ImageNet Large Scale Visual Recognition Challenge, deep learning has been at the forefront of most computer vision breakthroughs [13]. Deep learning models have recently been applied to stereo match-

Objectives

Methods

Findings

Conclusion