Depth estimation has significant potential in industrial applications, such as robot navigation. Light Field (LF) technology captures both spatial and angular information of a scene, enabling precise depth acquisition. Stereo matching technology has been commonly used for this task in LF imaging. However, previous methods typically compute the matching cost by simply shifting images with predefined offsets, causing poor performance in regions with sudden depth distribution changes (e.g., edges). To address this issue, we propose a Pixel-wise Matching Cost Function (PMCF) to estimate depth in units of pixel, then design a pixel-based network, termed PixelNet, for depth prediction, which achieves top rankings on the HCI 4D light field benchmark. Specifically, our cost function consists of two modules: the range search strategy and the modulation mechanism. The range search strategy enables the network to iteratively optimize depth on a per-pixel basis, while the modulation mechanism effectively handles scene noise and occlusions. By integrating these two modules, our function enables the generation of depth maps with sharper edges and smoother surfaces, compared with previous methods. Finally, we design a pixel-based network and apply it to both synthetic and real-world scenes, demonstrating that our method outperforms state-of-the-art methods in both scenes. Furthermore, PixelNet can also function as a post-processing network that can be seamlessly integrated with existing methods for depth refinement.
Read full abstract