Forest loss significantly contributes to climate change. As a result, scientists aim to understand its causes and evaluate forest loss classification maps. While Vision Transformers (ViTs) have demonstrated superior performance compared to convolutional neural networks (CNNs) in computer vision applications, they face challenges in recognizing and analyzing remote sensing images. These challenges include computational complexity, which increases processing costs, and the requirement for more labeled input data compared to CNNs. To address these issues, this paper introduces enhanced transformer UNet (e-TransUNet), which integrates spatial transformation architecture to enhance feature description in the skip connection path of the attention TransUNet for precise deforestation mapping. The model is validated in two prominent South American forest biomes—the Amazon and Atlantic forests. Experimental results highlight the significant improvement achieved by e-TransUNet, particularly in the Atlantic rainforests, where it boosts recall, overall accuracy, F1-score, dice coefficient, and precision by approximately 2, 2, 3, 4, and 4 percentage points, respectively, compared to the base TransUNet segmentation architecture. The code will be made publicly available at https://github.com/aj1365/e-TransUNet.
Read full abstract