Abstract
Integrated aerial and ground-based observations are essential for rapid land-use classification in modern land surveys, requiring efficient analysis of large volumes of landscape images. These images often contain background noise that impedes classification accuracy and exhibit spatial compression of the main scene, which causes semantic spatial distortion as depth increases. To address these challenges, this paper proposes a depth-based land-use semantic extraction (DLSE) method that effectively reduces background noise and corrects spatial distortions. The DLSE method follows three main steps: (1) depth estimation per pixel using a multi-resolution neural network to delineate the main scene range, filtering semantic noise; (2) conversion of perspective projection to orthographic projection per pixel to correct spatial distortion; and (3) improved land-use semantic extraction through MobileViT-enhanced feature extraction in SegNet. Experimental results demonstrate that DLSE achieves a 96.84% accuracy with more detailed outputs and operates at 23 fps, positioning it as an efficient tool for automated land surveys and land-use decision-making.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have