Abstract

Abstract360° depth estimation has been extensively studied because 360° images provide a full field of view of the surrounding environment as well as a detailed description of the entire scene. However, most well‐studied convolutional neural networks (CNNs) for 360° depth estimation can extract local features well, but fail to capture rich global features from the panorama due to a fixed receptive field in CNNs. PCformer, a parallel convolutional transformer network that combines the benefits of CNNs and transformers, is proposed for 360° depth estimation. The transformer has the nature to model long‐range dependency and extract global features. With PCformer, both global dependency and local spatial features can be efficiently captured. To fully incorporate global and local features, a dual attention fusion module is designed. Besides, a distortion‐weighted loss function is designed to reduce the distortion in panoramas. Extensive experiments demonstrate that the proposed method achieves competitive results against the state‐of‐the‐art methods on three benchmark datasets. Additional experiments also demonstrate that the proposed model has benefits in terms of model complexity and generalisation capability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call