Traditional self-supervised monocular depth estimation models do not adequately extract and fuse shallow features, which can easily lead to problems such as missed detection of small objects and blurred object edges. To address the above problems, this paper proposes a self-supervised monocular depth estimation model based on improved dense network and wavelet decomposition. The model follows the structure of U-net, in which the encoder adopts an improved dense connection method to improve the feature extraction and fusion capabilities of the encoder; at the same time, a detail enhancement module is added to the jump connection to further refine and integrate the multi-scale features output by the encoder; finally, the wavelet decomposition method is introduced in the decoder, forcing the decoder to pay more attention to high-frequency information and achieve refined processing of image edges. Experimental results show that the depth estimation model proposed in this paper has a stronger ability to capture small object features, and the edges of the generated depth map are clearer and more accurate.
Read full abstract