Abstract
The paper presents a novel depth-estimation method for light-field (LF) images based on innovative multi-stereo matching and machine-learning techniques. In the first stage, a novel block-based stereo matching algorithm is employed to compute the initial estimation. The proposed algorithm is specifically designed to operate on any pair of sub-aperture images (SAIs) in the LF image and to compute the pair’s corresponding disparity map. For the central SAI, a disparity fusion technique is proposed to compute the initial disparity map based on all available pairwise disparities. In the second stage, a novel pixel-wise deep-learning (DL)-based method for residual error prediction is employed to further refine the disparity estimation. A novel neural network architecture is proposed based on a new structure of layers. The proposed DL-based method is employed to predict the residual error of the initial estimation and to refine the final disparity map. The experimental results demonstrate the superiority of the proposed framework and reveal that the proposed method achieves an average improvement of in root mean squared error (RMSE), in mean absolute error (MAE), and in structural similarity index (SSIM) over machine-learning-based state-of-the-art methods.
Highlights
Light-field (LF) cameras were recently introduced in the image-processing and computer-vision domains in order to resolve the limitations of the conventional camera model
In [25], we proposed a DL-based depth-estimation method, where a neural network is employed to compute the disparity of each pixel by processing the 3D block patches extracted from epipolar plane images (EPIs)
The proposed method uses all available sub-aperture images (SAIs) when necessary. To achieve such goal, we propose to extend our method to take into account the epipolar lines of each stereo pair, because the pairs that are not extracted from the same row or column are neither horizontally nor vertically registered
Summary
Light-field (LF) cameras were recently introduced in the image-processing and computer-vision domains in order to resolve the limitations of the conventional camera model. Conventional cameras, which capture the red, green and blue (RGB) primary colours, were designed to capture the color and the accumulated light intensity of the incoming light rays from all directions incident to the camera plane at each pixel position. In contrast to this model, LF cameras were designed to capture the intensity, color, and directional information of each light ray at each pixel position, yielding a 4D LF image for each acquisition. LF cameras, known as plenoptic cameras, are implemented by placing an array of microlenses in front of the camera sensor They serve as an alternative to the conventional paradigm to acquire 4D. Conventional camera systems are difficult to implement and handle, and the inherently large baselines between cameras yield substantial difficulties when handling occlusions in many applications
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.