The object of research is the integration of a dynamic receptive field attention module (DReAM) into Swin Transformers to enhance scene localization and semantic segmentation for high-resolution remote sensing imagery. The study focuses on developing a model that dynamically adjusts its receptive field and integrates attention mechanisms to enhance multi-scale feature extraction in high-resolution remote sensing data. Traditional approaches, particularly convolutional neural networks (CNNs), suffer from fixed receptive fields, which hinder their ability to capture both fine details and long-range dependencies in large-scale remote sensing images. This limitation reduces the effectiveness of conventional models in handling spatially complex and multi-scale objects, leading to inaccuracies in object segmentation and scene interpretation. The DReAM-CAN model incorporates a dynamic receptive field scaling mechanism and a composite attention framework that combines CNN-based feature extraction with Swin Transformer self-attention. This approach enables the model to dynamically adjust its receptive field, efficiently process objects of various sizes, and better capture both local textures and global scene context. As a result, the model significantly improves segmentation accuracy and spatial adaptability in remote sensing imagery. These results are explained by the model’s ability to dynamically modify receptive fields based on scene complexity and object distribution. The self-attention mechanism further optimizes feature extraction by selectively enhancing relevant spatial dependencies, mitigating noise, and refining segmentation boundaries. The hybrid CNN-Transformer architecture ensures an optimal balance between computational efficiency and accuracy. The DReAM-CAN model is particularly applicable in high-resolution satellite and aerial imagery analysis, making it useful for environmental monitoring, land-use classification, forestry assessment, precision agriculture, and disaster impact analysis. Its ability to adapt to different scales and spatial complexities makes it ideal for real-time and large-scale remote sensing tasks that require precise scene localization and segmentation.
Read full abstract