CycleGAN with Atrous Spatial Pyramid Pooling and Attention-Enhanced MobileNetV4 for Tomato Disease Recognition Under Limited Training Data

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

To address the challenges of poor model generalization and suboptimal recognition accuracy stemming from limited and imbalanced sample sizes in tomato leaf disease identification, this study proposes a novel recognition strategy. This approach synergistically combines an enhanced image augmentation method based on generative adversarial networks with a lightweight deep learning model. Initially, an Atrous Spatial Pyramid Pooling (ASPP) module is integrated into the CycleGAN framework. This integration enhances the generator’s capacity to model multi-scale pathological lesion features, thereby significantly improving the diversity and realism of synthesized images. Subsequently, the Convolutional Block Attention Module (CBAM), incorporating both channel and spatial attention mechanisms, is embedded into the MobileNetV4 architecture. This enhancement boosts the model’s ability to focus on critical disease regions. Experimental results demonstrate that the proposed ASPP-CycleGAN significantly outperforms the original CycleGAN across multiple disease image generation tasks. Furthermore, the developed CBAM-MobileNetV4 model achieves a remarkable average recognition accuracy exceeding 97% for common tomato diseases, including early blight, late blight, and mosaic disease, representing a 1.86% improvement over the baseline MobileNetV4. The findings indicate that the proposed method offers exceptional data augmentation capabilities and classification performance under small-sample learning conditions, providing an effective technical foundation for the intelligent identification and control of tomato leaf diseases.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.3390/electronics13132427
Cable Conduit Defect Recognition Algorithm Based on Improved YOLOv8
  • Jun 21, 2024
  • Electronics
  • Fanfang Kong + 5 more

The underground cable conduit system, a vital component of urban power transmission and distribution infrastructure, faces challenges in maintenance and residue detection. Traditional detection methods, such as Closed-Circuit Television (CCTV), rely heavily on the expertise and prior experience of professional inspectors, leading to time-consuming and subjective results acquisition. To address these issues and automate defect detection in underground cable conduits, this paper proposes a defect recognition algorithm based on an enhanced YOLOv8 model. Firstly, we replace the Spatial Pyramid Pooling (SPPF) module in the original model with the Atrous Spatial Pyramid Pooling (ASPP) module to capture multi-scale defect features effectively. Secondly, to enhance feature representation and reduce noise interference, we integrate the Convolutional Block Attention Module (CBAM) into the detection head. Finally, we enhance the YOLOv8 backbone network by replacing the C2f module with the base module of ShuffleNet V2, reducing the number of model parameters and optimizing the model efficiency. Experimental results demonstrate the efficacy of the proposed algorithm in recognizing pipe misalignment and residual foreign objects. The precision and mean average precision (mAP) reach 96.2% and 97.6%, respectively, representing improvements over the original YOLOv8 model. This study significantly improves the capability of capturing and characterizing defect characteristics, thereby enhancing the maintenance efficiency and accuracy of underground cable conduit systems.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/ccis57298.2022.10016433
Cascaded ASPP and Attention Mechanism-based Deeplabv3+ Semantic Segmentation Model
  • Nov 26, 2022
  • Shuaiping Guo + 1 more

Deeplabv3+ is a standard semantic segmentation model, which adds decoding structure to recover spatial information of the image and uses the Atrous Spatial Pyramid Pooling (ASPP) module to solve the multi-scale problem of the image. However, the Deeplabv3+ model has some drawbacks regarding restoring details. Therefore, we propose the CB_Deeplabv3+ model. In the encoding structure of the CB_Deeplabv3+ model, we use ASPP modules cascaded in parallel to extend the network structure and enable the model to capture richer context information by increasing the information interaction between channels. At the same time, CB_Deeplabv3+ introduced the Convolutional Block Attention Module(CBAM) to solve the long-distance dependence problem in the encoding-decoding structure. Experimental evaluation results on the Part_VOC dataset show that CB_Deeplabv3+ achieves excellent performance for semantic segmentation.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/igarss46834.2022.9883157
GAN with ASPP for SAR Image to Optical Image Conversion
  • Jul 17, 2022
  • Zikang Shao + 2 more

Researchers can gain more intuitive information by converting synthetic aperture radar (SAR) images to optical images using generative adversarial networks (GANs). However, their GANs have poor feature extraction ability, which leads to color conversion errors and loss of details. Therefore, to solve this problem, we add an atrous spatial pyramid pooling (ASPP) module to GAN to enhance the feature extraction ability, i.e., ASPP-GAN. ASPP module can extract multi-resolution feature responses, enabling the network to focus on both overall and detailed features for better feature extraction ability. The experimental results on public SEN1-2 datasets show that ASPP-GAN has a significant improvement over the traditional GAN, i.e., Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) values are improved by about 20%.

  • Research Article
  • 10.3390/aerospace12060542
Interactive Maintenance of Space Station Devices Using Scene Semantic Segmentation
  • Jun 15, 2025
  • Aerospace
  • Haoting Liu + 8 more

A novel interactive maintenance method for space station in-orbit devices using scene semantic segmentation technology is proposed. First, a wearable and handheld system is designed to capture images from the astronaut in the space station’s front view scene and play these images on a handheld terminal in real-time. Second, the proposed system quantitatively evaluates the environmental lighting condition in the scene by calculating image quality evaluation parameters. If the lighting condition is not proper, a prompt message will be given to the astronaut to remind him or her to adjust the environment illumination. Third, our system adopts an improved DeepLabV3+ network for semantic segmentation of these astronauts’ forward view scene images. Regarding the improved network, the original backbone network is replaced with a lightweight convolutional neural network, i.e., the MobileNetV2, with a smaller model scale and computational complexity. The convolutional block attention module (CBAM) is introduced to improve the network’s feature perception ability. The atrous spatial pyramid pooling (ASPP) module is also considered to enable an accurate calculation of encoding multi-scale information. Extensive simulation experiment results indicate that the accuracy, precision, and average intersection over the union of the proposed algorithm can be better than 95.0%, 96.0%, and 89.0%, respectively. And the ground application experiments have also shown that our proposed technique can effectively shorten the working time of the system user.

  • Research Article
  • Cite Count Icon 22
  • 10.1016/j.eswa.2023.119560
Sw-YoloX: An anchor-free detector based transformer for sea surface object detection
  • Jan 13, 2023
  • Expert Systems with Applications
  • Jiangang Ding + 5 more

Sw-YoloX: An anchor-free detector based transformer for sea surface object detection

  • Conference Article
  • Cite Count Icon 5
  • 10.23919/ccc52363.2021.9550003
Feature Extraction for Side Scan Sonar Image Based on Deep Learning
  • Jul 26, 2021
  • Yanghua Tang + 4 more

There are some questions in sonar images features extract, such as strong speckle noise, low image resolution, poor image quality, and difficult target segmentation. In order to overcome the shortcomings of traditional algorithms in sonar image feature extraction, we apply Mask RCNN instance segmentation network to sonar image feature extraction. The effectiveness of the deep learning algorithm is verified by experiments. First of all, we use online and off-line data enhancement method to expand the data set. Then, we improve the residual network, and adding convolutional block attention module (CBAM), group normalization (GN) and atrous spatial pyramid pooling (ASPP) module to network. Finally, we choose the best network structure and add focal loss function to improve semantic segmentation. The final experimental results show that the Average Precision (AP) and Mean Intersection over Union (mIoU) of the proposed network are improved compared with the original network on the side scan sonar dataset.

  • Research Article
  • 10.3390/rs17213563
Phenology-Guided Wheat and Corn Identification in Xinjiang: An Improved U-Net Semantic Segmentation Model Using PCA and CBAM-ASPP
  • Oct 28, 2025
  • Remote Sensing
  • Yang Wei + 6 more

Wheat and corn are two major food crops in Xinjiang. However, the spectral similarity between these crop types and the complexity of their spatial distribution has posed significant challenges to accurate crop identification. To this end, the study aimed to improve the accuracy of crop distribution identification in complex environments in three ways. First, by analysing the kNDVI and EVI time series, the optimal identification window was determined to be days 156–176—a period when wheat is in the grain-filling to milk-ripening phase and maize is in the jointing to tillering phase—during which, the strongest spectral differences between the two crops occurs. Second, principal component analysis (PCA) was applied to Sentinel-2 data. The top three principal components were extracted to construct the input dataset, effectively integrating visible and near-infrared band information. This approach suppressed redundancy and noise while replacing traditional RGB datasets. Finally, the Convolutional Block Attention Module (CBAM) was integrated into the U-Net model to enhance feature focusing on key crop areas. An improved Atrous Spatial Pyramid Pooling (ASPP) module based on deep separable convolutions was adopted to reduce the computational load while boosting multi-scale context awareness. The experimental results showed the following: (1) Wheat and corn exhibit obvious phenological differences between the 156th and 176th days of the year, which can be used as the optimal time window for identifying their spatial distributions. (2) The method proposed by this research had the best performance, with its mIoU, mPA, F1-score, and overall accuracy (OA) reaching 83.03%, 91.34%, 90.73%, and 90.91%, respectively. Compared to DeeplabV3+, PSPnet, HRnet, Segformer, and U-Net, the OA improved by 5.97%, 4.55%, 2.03%, 8.99%, and 1.5%, respectively. The recognition accuracy of the PCA dataset improved by approximately 2% compared to the RGB dataset. (3) This strategy still had high accuracy when predicting wheat and corn yields in Qitai County, Xinjiang, and had a certain degree of generalisability. In summary, the improved strategy proposed in this study holds considerable application potential for identifying the spatial distribution of wheat and corn in arid regions.

  • Research Article
  • 10.3390/electronics14142865
Enhancing Suburban Lane Detection Through Improved DeepLabV3+ Semantic Segmentation
  • Jul 17, 2025
  • Electronics
  • Shuwan Cui + 6 more

Lane detection is a key technology in automatic driving environment perception, and its accuracy directly affects vehicle positioning, path planning, and driving safety. In this study, an enhanced real-time model for lane detection based on an improved DeepLabV3+ architecture is proposed to address the challenges posed by complex dynamic backgrounds and blurred road boundaries in suburban road scenarios. To address the lack of feature correlation in the traditional Atrous Spatial Pyramid Pooling (ASPP) module of the DeepLabV3+ model, we propose an improved LC-DenseASPP module. First, inspired by DenseASPP, the number of dilated convolution layers is reduced from six to three by adopting a dense connection to enhance feature reuse, significantly reducing computational complexity. Second, the convolutional block attention module (CBAM) attention mechanism is embedded after the LC-DenseASPP dilated convolution operation. This effectively improves the model’s ability to focus on key features through the adaptive refinement of channel and spatial attention features. Finally, an image-pooling operation is introduced in the last layer of the LC-DenseASPP to further enhance the ability to capture global context information. DySample is introduced to replace bilinear upsampling in the decoder, ensuring model performance while reducing computational resource consumption. The experimental results show that the model achieves a good balance between segmentation accuracy and computational efficiency, with a mean intersection over union (mIoU) of 95.48% and an inference speed of 128 frames per second (FPS). Additionally, a new lane-detection dataset, SubLane, is constructed to fill the gap in the research field of lane detection in suburban road scenarios.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/spin57001.2023.10116882
Significance of Atrous Spatial Pyramid Pooling (ASPP) in Deeplabv3+ for Water Body Segmentation
  • Mar 23, 2023
  • Gosula Sunandini + 3 more

Water body segmentation is a difficult task because of complex shapes, edges, sizes, and the surroundings. DeeplabV3+ has proven to perform well on segmentation tasks with multi-scale features. Atrous Spatial Pyramid Pooling (ASPP) module present in Deeplab models helps to increase the field of view for extracting multi-scale features. In this work, we analyze how the ASPP extracts multi-scale features in DeeplabV3+ by comparing the performance of Deeplabv3+ model with ASPP and without ASPP for water segmentation task. The dataset used for this study is a public dataset that contains RGB satellite images and corresponding masks from sentine1-2 A/B satellite. The evaluation metrics used for this study are IoU score, Dice score, Recall and Precision. The results obtained from the study shows that DeeplabV3+ model with ASPP extract the water bodies with different sizes. It also captures the boundaries accurately compared to the model without ASPP module.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.jag.2024.104236
SpaGAN: A spatially-aware generative adversarial network for building generalization in image maps
  • Dec 1, 2024
  • International Journal of Applied Earth Observation and Geoinformation
  • Zhiyong Zhou + 2 more

Building generalization is an essential task in generating multi-scale topographic maps. The progress of deep learning offers a new paradigm to overcome the coordination challenges faced by conventional building generalization algorithms. Some studies have confirmed the feasibility of several original semantic segmentation networks, such as U-Net and its variants and the conditional generative adversarial network (cGAN), for building generalization in image maps. However, they suffer from critical deformation effects, especially for large and geometrically complex buildings. Since learning building generalization essentially means modeling the subtle transformation of building footprints across scales, we argue that the spatial awareness of a neural network, for instance, regarding building size and shape, is crucial to effective learning. Thus, we propose a spatially-aware generative adversarial network, SpaGAN. It takes a representative cGAN, pix2pix, as the backbone, and modifies two modules: In the U-Net-based generator, an atrous spatial pyramid pooling (ASPP) module replaces the conventional convolutional module to extract multi-scale features of buildings of varying sizes and shapes; in the PatchGAN-based discriminator, a signed distance map (SDM) module is used to capture the fine-grained shape difference for discrimination. The proposed network was comprehensively evaluated with a synthetic and a real-world dataset. The results demonstrate that SpaGAN outperforms existing baseline models (U-Net, ResU-Net, pix2pix) for building generalization, particularly in the real-world dataset. The new model can achieve more reasonable aggregation, simplification, and squaring generalization operators.

  • Research Article
  • 10.1002/eng2.70330
Design of Generative Adversarial Network Super‐Resolution Reconstruction Algorithm for Intelligent Security Images
  • Aug 1, 2025
  • Engineering Reports
  • Yuyao Liu + 1 more

ABSTRACTIn the reconstruction of security images, the current generative adversarial network super‐resolution reconstruction algorithm is prone to generate unrealistic artifacts under high noise and low contrast, and the details of small targets are blurred. This paper adopts an improved Real‐ESRGAN (Real‐Enhanced Super‐Resolution Generative Adversarial Network) and CBAM (Convolutional Block Attention Module) algorithm to perform super‐resolution reconstruction of security images and improve the quality of reconstructed images. The study applies multi‐scale convolution kernels in the RRDB (residual in residual dense block) module of the model and uses convolution kernels of different sizes to extract and fuse image features to comprehensively capture local and overall detail information in the image. Then, based on Real‐ESRGAN, the CBAM module is applied in the generator, and the channel attention and spatial attention mechanisms are used to adaptively focus on multi‐scale features to enhance the modeling ability of texture details in the target area. Finally, a multi‐loss fusion optimization strategy is adopted, and color consistency loss and total variation loss are applied to effectively suppress artifacts and color drift problems in the reconstruction process. The experiment takes the security image in the license plate recognition task as an example to perform super‐resolution reconstruction. The results show that the PSNR (Peak Signal‐to‐Noise Ratio) and SSIM (Structural Similarity) of the improved Real‐ESRGAN‐CBAM are the best, reaching 20.1 dB and 0.635, respectively, under extremely high noise of 5 dB, and still reaching 26.1 dB and SSIM of 0.765 under extremely low contrast of 0.2. The experimental results show that the improved Real‐ESRGAN and CBAM combined algorithm in this paper greatly improves the reconstruction quality of security images, effectively suppresses the generation of artifacts, and can better meet the dual needs of intelligent security systems in image quality and practicality.

  • Research Article
  • Cite Count Icon 4
  • 10.1109/lgrs.2020.3036823
Semisupervised Multiscale Generative Adversarial Network for Semantic Segmentation of Remote Sensing Image
  • Dec 1, 2020
  • IEEE Geoscience and Remote Sensing Letters
  • Jiaqi Wang + 7 more

Semantic segmentation of remote sensing images based on deep neural networks has gained wide attention recently. Although many methods have achieved amazing performance, they need large amounts of labeled images to distinguish the differences in angle, color, size, and other aspects for small targets in remote sensing data sets. However, with a few labeled images, it is difficult to extract the key features of small targets. We propose a semisupervised multiscale generative adversarial network (GAN), which not only utilizes the multipath input and atrous spatial pyramid pooling (ASPP) module but leverages unlabeled images and semisupervised learning strategy to improve the performance of small target segmentation in semantic segmentation when labeled data amount is small. Experimental results show that our model outperforms state-of-the-art methods with insufficient labeled data.

  • Research Article
  • 10.3389/fphys.2025.1457197
Enhanced thyroid nodule detection and diagnosis: a mobile-optimized DeepLabV3+ approach for clinical deployments.
  • Mar 24, 2025
  • Frontiers in physiology
  • Changan Yang + 6 more

This study aims to enhance the efficiency and accuracy of thyroid nodule segmentation in ultrasound images, ultimately improving nodule detection and diagnosis. For clinical deployment on mobile and embedded devices, DeepLabV3+ strives to achieve a balance between a lightweight architecture and high segmentation accuracy. A comprehensive dataset of ultrasound images was meticulously curated using a high-resolution ultrasound imaging device. Data acquisition adhered to standardized protocols to ensure high-quality imaging. Preprocessing steps, including noise reduction and contrast optimization, were applied to enhance image clarity. Expert radiologists provided ground truth labels through meticulous annotation. To improve segmentation performance, we integrated MobileNetV2 and Depthwise Separable Dilated Convolution into the Atrous Spatial Pyramid Pooling (ASPP) module, incorporating the Pyramid Pooling Module (PPM) and attention mechanisms. To mitigate classification imbalances, we employed Tversky loss functions in the ultrasound image classification process. In semantic image segmentation, DeepLabV3+ achieved an impressive Intersection over Union (IoU) of 94.37%, while utilizing only 12.4 MB of parameters, including weights and biases. This remarkable accuracy demonstrates the effectiveness of our approach. A high IoU value in medical imaging analysis reflects the model's ability to accurately delineate object boundaries. DeepLabV3+ represents a significant advancement in thyroid nodule segmentation, particularly for thyroid cancer screening and diagnosis. The obtained segmentation results suggest promising directions for future research, especially in the early detection of thyroid nodules. Deploying this algorithm on mobile devices offers a practical solution for early diagnosis and is likely to improve patient outcomes.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 9
  • 10.3389/fpls.2024.1305358
RAAWC-UNet: an apple leaf and disease segmentation method based on residual attention and atrous spatial pyramid pooling improved UNet with weight compression loss.
  • Mar 11, 2024
  • Frontiers in Plant Science
  • Jianlong Wang + 4 more

Early detection of leaf diseases is necessary to control the spread of plant diseases, and one of the important steps is the segmentation of leaf and disease images. The uneven light and leaf overlap in complex situations make segmentation of leaves and diseases quite difficult. Moreover, the significant differences in ratios of leaf and disease pixels results in a challenge in identifying diseases. To solve the above issues, the residual attention mechanism combined with atrous spatial pyramid pooling and weight compression loss of UNet is proposed, which is named RAAWC-UNet. Firstly, weights compression loss is a method that introduces a modulation factor in front of the cross-entropy loss, aiming at solving the problem of the imbalance between foreground and background pixels. Secondly, the residual network and the convolutional block attention module are combined to form Res_CBAM. It can accurately localize pixels at the edge of the disease and alleviate the vanishing of gradient and semantic information from downsampling. Finally, in the last layer of downsampling, the atrous spatial pyramid pooling is used instead of two convolutions to solve the problem of insufficient spatial context information. The experimental results show that the proposed RAAWC-UNet increases the intersection over union in leaf and disease segmentation by 1.91% and 5.61%, and the pixel accuracy of disease by 4.65% compared with UNet. The effectiveness of the proposed method was further verified by the better results in comparison with deep learning methods with similar network architectures.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.3389/fpls.2023.1139666
An improved Deeplab V3+ network based coconut CT image segmentation method
  • Dec 8, 2023
  • Frontiers in Plant Science
  • Qianfan Liu + 7 more

Due to the unique structure of coconuts, their cultivation heavily relies on manual experience, making it difficult to accurately and timely observe their internal characteristics. This limitation severely hinders the optimization of coconut breeding. To address this issue, we propose a new model based on the improved architecture of Deeplab V3+. We replace the original ASPP(Atrous Spatial Pyramid Pooling) structure with a dense atrous spatial pyramid pooling module and introduce CBAM(Convolutional Block Attention Module). This approach resolves the issue of information loss due to sparse sampling and effectively captures global features. Additionally, we embed a RRM(residual refinement module) after the output level of the decoder to optimize boundary information between organs. Multiple model comparisons and ablation experiments are conducted, demonstrating that the improved segmentation algorithm achieves higher accuracy when dealing with diverse coconut organ CT(Computed Tomography) images. Our work provides a new solution for accurately segmenting internal coconut organs, which facilitates scientific decision-making for coconut researchers at different stages of growth.

More from: Applied Sciences
  • New
  • Research Article
  • 10.3390/app152111838
Research on Target Detection and Counting Algorithms for Swarming Termites in Agricultural and Forestry Disaster Early Warning
  • Nov 6, 2025
  • Applied Sciences
  • Hechuang Wang + 2 more

  • New
  • Research Article
  • 10.3390/app152111830
Hybrid Coatings of Chitosan-Tetracycline-Oxide Layer on Anodized Ti-13Zr-13Nb Alloy as New Drug Delivery System
  • Nov 6, 2025
  • Applied Sciences
  • Aizada Utenaliyeva + 6 more

  • New
  • Research Article
  • 10.3390/app152111846
SP-Transformer: A Medium- and Long-Term Photovoltaic Power Forecasting Model Integrating Multi-Source Spatiotemporal Features
  • Nov 6, 2025
  • Applied Sciences
  • Bin Wang + 5 more

  • New
  • Research Article
  • 10.3390/app152111831
Non-Steady-State Coupled Model of Viscosity–Temperature–Pressure in Polymer Flooding Injection Wellbores
  • Nov 6, 2025
  • Applied Sciences
  • Yutian Huang + 5 more

  • New
  • Research Article
  • 10.3390/app152111823
Enhancing Biscuit Nutritional Value Through Apple and Sour Cherry Pomace Fortification
  • Nov 6, 2025
  • Applied Sciences
  • Maria Bianca Mandache + 3 more

  • New
  • Research Article
  • 10.3390/app152111833
Electric Field and Charge Characteristics at the Gas–Solid Interface of a Scaled HVDC Wall Bushing Model
  • Nov 6, 2025
  • Applied Sciences
  • Wenhao Lu + 7 more

  • New
  • Research Article
  • 10.3390/app152111845
Detecting Audio Copy-Move Forgeries on Mel Spectrograms via Hybrid Keypoint Features
  • Nov 6, 2025
  • Applied Sciences
  • Ezgi Ozgen + 1 more

  • New
  • Research Article
  • 10.3390/app152111840
A-BiYOLOv9: An Attention-Guided YOLOv9 Model for Infrared-Based Wind Turbine Inspection
  • Nov 6, 2025
  • Applied Sciences
  • Sami Ekici + 2 more

  • New
  • Research Article
  • 10.3390/app152111825
CFD Analysis of Natural Convection Performance of a MMRTG Model Under Martian Atmospheric Conditions
  • Nov 6, 2025
  • Applied Sciences
  • Rafael Bardera-Mora + 4 more

  • New
  • Research Article
  • 10.3390/app152111820
Internal and External Loads in U16 Women’s Basketball Players Participating in U18 Training Sessions: A Case Study
  • Nov 6, 2025
  • Applied Sciences
  • Álvaro Bustamante-Sánchez + 3 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon