CycleGAN with Atrous Spatial Pyramid Pooling and Attention-Enhanced MobileNetV4 for Tomato Disease Recognition Under Limited Training Data
To address the challenges of poor model generalization and suboptimal recognition accuracy stemming from limited and imbalanced sample sizes in tomato leaf disease identification, this study proposes a novel recognition strategy. This approach synergistically combines an enhanced image augmentation method based on generative adversarial networks with a lightweight deep learning model. Initially, an Atrous Spatial Pyramid Pooling (ASPP) module is integrated into the CycleGAN framework. This integration enhances the generator’s capacity to model multi-scale pathological lesion features, thereby significantly improving the diversity and realism of synthesized images. Subsequently, the Convolutional Block Attention Module (CBAM), incorporating both channel and spatial attention mechanisms, is embedded into the MobileNetV4 architecture. This enhancement boosts the model’s ability to focus on critical disease regions. Experimental results demonstrate that the proposed ASPP-CycleGAN significantly outperforms the original CycleGAN across multiple disease image generation tasks. Furthermore, the developed CBAM-MobileNetV4 model achieves a remarkable average recognition accuracy exceeding 97% for common tomato diseases, including early blight, late blight, and mosaic disease, representing a 1.86% improvement over the baseline MobileNetV4. The findings indicate that the proposed method offers exceptional data augmentation capabilities and classification performance under small-sample learning conditions, providing an effective technical foundation for the intelligent identification and control of tomato leaf diseases.
- Research Article
1
- 10.3390/electronics13132427
- Jun 21, 2024
- Electronics
The underground cable conduit system, a vital component of urban power transmission and distribution infrastructure, faces challenges in maintenance and residue detection. Traditional detection methods, such as Closed-Circuit Television (CCTV), rely heavily on the expertise and prior experience of professional inspectors, leading to time-consuming and subjective results acquisition. To address these issues and automate defect detection in underground cable conduits, this paper proposes a defect recognition algorithm based on an enhanced YOLOv8 model. Firstly, we replace the Spatial Pyramid Pooling (SPPF) module in the original model with the Atrous Spatial Pyramid Pooling (ASPP) module to capture multi-scale defect features effectively. Secondly, to enhance feature representation and reduce noise interference, we integrate the Convolutional Block Attention Module (CBAM) into the detection head. Finally, we enhance the YOLOv8 backbone network by replacing the C2f module with the base module of ShuffleNet V2, reducing the number of model parameters and optimizing the model efficiency. Experimental results demonstrate the efficacy of the proposed algorithm in recognizing pipe misalignment and residual foreign objects. The precision and mean average precision (mAP) reach 96.2% and 97.6%, respectively, representing improvements over the original YOLOv8 model. This study significantly improves the capability of capturing and characterizing defect characteristics, thereby enhancing the maintenance efficiency and accuracy of underground cable conduit systems.
- Conference Article
2
- 10.1109/ccis57298.2022.10016433
- Nov 26, 2022
Deeplabv3+ is a standard semantic segmentation model, which adds decoding structure to recover spatial information of the image and uses the Atrous Spatial Pyramid Pooling (ASPP) module to solve the multi-scale problem of the image. However, the Deeplabv3+ model has some drawbacks regarding restoring details. Therefore, we propose the CB_Deeplabv3+ model. In the encoding structure of the CB_Deeplabv3+ model, we use ASPP modules cascaded in parallel to extend the network structure and enable the model to capture richer context information by increasing the information interaction between channels. At the same time, CB_Deeplabv3+ introduced the Convolutional Block Attention Module(CBAM) to solve the long-distance dependence problem in the encoding-decoding structure. Experimental evaluation results on the Part_VOC dataset show that CB_Deeplabv3+ achieves excellent performance for semantic segmentation.
- Conference Article
3
- 10.1109/igarss46834.2022.9883157
- Jul 17, 2022
Researchers can gain more intuitive information by converting synthetic aperture radar (SAR) images to optical images using generative adversarial networks (GANs). However, their GANs have poor feature extraction ability, which leads to color conversion errors and loss of details. Therefore, to solve this problem, we add an atrous spatial pyramid pooling (ASPP) module to GAN to enhance the feature extraction ability, i.e., ASPP-GAN. ASPP module can extract multi-resolution feature responses, enabling the network to focus on both overall and detailed features for better feature extraction ability. The experimental results on public SEN1-2 datasets show that ASPP-GAN has a significant improvement over the traditional GAN, i.e., Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) values are improved by about 20%.
- Research Article
- 10.3390/aerospace12060542
- Jun 15, 2025
- Aerospace
A novel interactive maintenance method for space station in-orbit devices using scene semantic segmentation technology is proposed. First, a wearable and handheld system is designed to capture images from the astronaut in the space station’s front view scene and play these images on a handheld terminal in real-time. Second, the proposed system quantitatively evaluates the environmental lighting condition in the scene by calculating image quality evaluation parameters. If the lighting condition is not proper, a prompt message will be given to the astronaut to remind him or her to adjust the environment illumination. Third, our system adopts an improved DeepLabV3+ network for semantic segmentation of these astronauts’ forward view scene images. Regarding the improved network, the original backbone network is replaced with a lightweight convolutional neural network, i.e., the MobileNetV2, with a smaller model scale and computational complexity. The convolutional block attention module (CBAM) is introduced to improve the network’s feature perception ability. The atrous spatial pyramid pooling (ASPP) module is also considered to enable an accurate calculation of encoding multi-scale information. Extensive simulation experiment results indicate that the accuracy, precision, and average intersection over the union of the proposed algorithm can be better than 95.0%, 96.0%, and 89.0%, respectively. And the ground application experiments have also shown that our proposed technique can effectively shorten the working time of the system user.
- Research Article
22
- 10.1016/j.eswa.2023.119560
- Jan 13, 2023
- Expert Systems with Applications
Sw-YoloX: An anchor-free detector based transformer for sea surface object detection
- Conference Article
5
- 10.23919/ccc52363.2021.9550003
- Jul 26, 2021
There are some questions in sonar images features extract, such as strong speckle noise, low image resolution, poor image quality, and difficult target segmentation. In order to overcome the shortcomings of traditional algorithms in sonar image feature extraction, we apply Mask RCNN instance segmentation network to sonar image feature extraction. The effectiveness of the deep learning algorithm is verified by experiments. First of all, we use online and off-line data enhancement method to expand the data set. Then, we improve the residual network, and adding convolutional block attention module (CBAM), group normalization (GN) and atrous spatial pyramid pooling (ASPP) module to network. Finally, we choose the best network structure and add focal loss function to improve semantic segmentation. The final experimental results show that the Average Precision (AP) and Mean Intersection over Union (mIoU) of the proposed network are improved compared with the original network on the side scan sonar dataset.
- Research Article
- 10.3390/rs17213563
- Oct 28, 2025
- Remote Sensing
Wheat and corn are two major food crops in Xinjiang. However, the spectral similarity between these crop types and the complexity of their spatial distribution has posed significant challenges to accurate crop identification. To this end, the study aimed to improve the accuracy of crop distribution identification in complex environments in three ways. First, by analysing the kNDVI and EVI time series, the optimal identification window was determined to be days 156–176—a period when wheat is in the grain-filling to milk-ripening phase and maize is in the jointing to tillering phase—during which, the strongest spectral differences between the two crops occurs. Second, principal component analysis (PCA) was applied to Sentinel-2 data. The top three principal components were extracted to construct the input dataset, effectively integrating visible and near-infrared band information. This approach suppressed redundancy and noise while replacing traditional RGB datasets. Finally, the Convolutional Block Attention Module (CBAM) was integrated into the U-Net model to enhance feature focusing on key crop areas. An improved Atrous Spatial Pyramid Pooling (ASPP) module based on deep separable convolutions was adopted to reduce the computational load while boosting multi-scale context awareness. The experimental results showed the following: (1) Wheat and corn exhibit obvious phenological differences between the 156th and 176th days of the year, which can be used as the optimal time window for identifying their spatial distributions. (2) The method proposed by this research had the best performance, with its mIoU, mPA, F1-score, and overall accuracy (OA) reaching 83.03%, 91.34%, 90.73%, and 90.91%, respectively. Compared to DeeplabV3+, PSPnet, HRnet, Segformer, and U-Net, the OA improved by 5.97%, 4.55%, 2.03%, 8.99%, and 1.5%, respectively. The recognition accuracy of the PCA dataset improved by approximately 2% compared to the RGB dataset. (3) This strategy still had high accuracy when predicting wheat and corn yields in Qitai County, Xinjiang, and had a certain degree of generalisability. In summary, the improved strategy proposed in this study holds considerable application potential for identifying the spatial distribution of wheat and corn in arid regions.
- Research Article
- 10.3390/electronics14142865
- Jul 17, 2025
- Electronics
Lane detection is a key technology in automatic driving environment perception, and its accuracy directly affects vehicle positioning, path planning, and driving safety. In this study, an enhanced real-time model for lane detection based on an improved DeepLabV3+ architecture is proposed to address the challenges posed by complex dynamic backgrounds and blurred road boundaries in suburban road scenarios. To address the lack of feature correlation in the traditional Atrous Spatial Pyramid Pooling (ASPP) module of the DeepLabV3+ model, we propose an improved LC-DenseASPP module. First, inspired by DenseASPP, the number of dilated convolution layers is reduced from six to three by adopting a dense connection to enhance feature reuse, significantly reducing computational complexity. Second, the convolutional block attention module (CBAM) attention mechanism is embedded after the LC-DenseASPP dilated convolution operation. This effectively improves the model’s ability to focus on key features through the adaptive refinement of channel and spatial attention features. Finally, an image-pooling operation is introduced in the last layer of the LC-DenseASPP to further enhance the ability to capture global context information. DySample is introduced to replace bilinear upsampling in the decoder, ensuring model performance while reducing computational resource consumption. The experimental results show that the model achieves a good balance between segmentation accuracy and computational efficiency, with a mean intersection over union (mIoU) of 95.48% and an inference speed of 128 frames per second (FPS). Additionally, a new lane-detection dataset, SubLane, is constructed to fill the gap in the research field of lane detection in suburban road scenarios.
- Conference Article
7
- 10.1109/spin57001.2023.10116882
- Mar 23, 2023
Water body segmentation is a difficult task because of complex shapes, edges, sizes, and the surroundings. DeeplabV3+ has proven to perform well on segmentation tasks with multi-scale features. Atrous Spatial Pyramid Pooling (ASPP) module present in Deeplab models helps to increase the field of view for extracting multi-scale features. In this work, we analyze how the ASPP extracts multi-scale features in DeeplabV3+ by comparing the performance of Deeplabv3+ model with ASPP and without ASPP for water segmentation task. The dataset used for this study is a public dataset that contains RGB satellite images and corresponding masks from sentine1-2 A/B satellite. The evaluation metrics used for this study are IoU score, Dice score, Recall and Precision. The results obtained from the study shows that DeeplabV3+ model with ASPP extract the water bodies with different sizes. It also captures the boundaries accurately compared to the model without ASPP module.
- Research Article
1
- 10.1016/j.jag.2024.104236
- Dec 1, 2024
- International Journal of Applied Earth Observation and Geoinformation
Building generalization is an essential task in generating multi-scale topographic maps. The progress of deep learning offers a new paradigm to overcome the coordination challenges faced by conventional building generalization algorithms. Some studies have confirmed the feasibility of several original semantic segmentation networks, such as U-Net and its variants and the conditional generative adversarial network (cGAN), for building generalization in image maps. However, they suffer from critical deformation effects, especially for large and geometrically complex buildings. Since learning building generalization essentially means modeling the subtle transformation of building footprints across scales, we argue that the spatial awareness of a neural network, for instance, regarding building size and shape, is crucial to effective learning. Thus, we propose a spatially-aware generative adversarial network, SpaGAN. It takes a representative cGAN, pix2pix, as the backbone, and modifies two modules: In the U-Net-based generator, an atrous spatial pyramid pooling (ASPP) module replaces the conventional convolutional module to extract multi-scale features of buildings of varying sizes and shapes; in the PatchGAN-based discriminator, a signed distance map (SDM) module is used to capture the fine-grained shape difference for discrimination. The proposed network was comprehensively evaluated with a synthetic and a real-world dataset. The results demonstrate that SpaGAN outperforms existing baseline models (U-Net, ResU-Net, pix2pix) for building generalization, particularly in the real-world dataset. The new model can achieve more reasonable aggregation, simplification, and squaring generalization operators.
- Research Article
- 10.1002/eng2.70330
- Aug 1, 2025
- Engineering Reports
ABSTRACTIn the reconstruction of security images, the current generative adversarial network super‐resolution reconstruction algorithm is prone to generate unrealistic artifacts under high noise and low contrast, and the details of small targets are blurred. This paper adopts an improved Real‐ESRGAN (Real‐Enhanced Super‐Resolution Generative Adversarial Network) and CBAM (Convolutional Block Attention Module) algorithm to perform super‐resolution reconstruction of security images and improve the quality of reconstructed images. The study applies multi‐scale convolution kernels in the RRDB (residual in residual dense block) module of the model and uses convolution kernels of different sizes to extract and fuse image features to comprehensively capture local and overall detail information in the image. Then, based on Real‐ESRGAN, the CBAM module is applied in the generator, and the channel attention and spatial attention mechanisms are used to adaptively focus on multi‐scale features to enhance the modeling ability of texture details in the target area. Finally, a multi‐loss fusion optimization strategy is adopted, and color consistency loss and total variation loss are applied to effectively suppress artifacts and color drift problems in the reconstruction process. The experiment takes the security image in the license plate recognition task as an example to perform super‐resolution reconstruction. The results show that the PSNR (Peak Signal‐to‐Noise Ratio) and SSIM (Structural Similarity) of the improved Real‐ESRGAN‐CBAM are the best, reaching 20.1 dB and 0.635, respectively, under extremely high noise of 5 dB, and still reaching 26.1 dB and SSIM of 0.765 under extremely low contrast of 0.2. The experimental results show that the improved Real‐ESRGAN and CBAM combined algorithm in this paper greatly improves the reconstruction quality of security images, effectively suppresses the generation of artifacts, and can better meet the dual needs of intelligent security systems in image quality and practicality.
- Research Article
4
- 10.1109/lgrs.2020.3036823
- Dec 1, 2020
- IEEE Geoscience and Remote Sensing Letters
Semantic segmentation of remote sensing images based on deep neural networks has gained wide attention recently. Although many methods have achieved amazing performance, they need large amounts of labeled images to distinguish the differences in angle, color, size, and other aspects for small targets in remote sensing data sets. However, with a few labeled images, it is difficult to extract the key features of small targets. We propose a semisupervised multiscale generative adversarial network (GAN), which not only utilizes the multipath input and atrous spatial pyramid pooling (ASPP) module but leverages unlabeled images and semisupervised learning strategy to improve the performance of small target segmentation in semantic segmentation when labeled data amount is small. Experimental results show that our model outperforms state-of-the-art methods with insufficient labeled data.
- Research Article
- 10.3389/fphys.2025.1457197
- Mar 24, 2025
- Frontiers in physiology
This study aims to enhance the efficiency and accuracy of thyroid nodule segmentation in ultrasound images, ultimately improving nodule detection and diagnosis. For clinical deployment on mobile and embedded devices, DeepLabV3+ strives to achieve a balance between a lightweight architecture and high segmentation accuracy. A comprehensive dataset of ultrasound images was meticulously curated using a high-resolution ultrasound imaging device. Data acquisition adhered to standardized protocols to ensure high-quality imaging. Preprocessing steps, including noise reduction and contrast optimization, were applied to enhance image clarity. Expert radiologists provided ground truth labels through meticulous annotation. To improve segmentation performance, we integrated MobileNetV2 and Depthwise Separable Dilated Convolution into the Atrous Spatial Pyramid Pooling (ASPP) module, incorporating the Pyramid Pooling Module (PPM) and attention mechanisms. To mitigate classification imbalances, we employed Tversky loss functions in the ultrasound image classification process. In semantic image segmentation, DeepLabV3+ achieved an impressive Intersection over Union (IoU) of 94.37%, while utilizing only 12.4 MB of parameters, including weights and biases. This remarkable accuracy demonstrates the effectiveness of our approach. A high IoU value in medical imaging analysis reflects the model's ability to accurately delineate object boundaries. DeepLabV3+ represents a significant advancement in thyroid nodule segmentation, particularly for thyroid cancer screening and diagnosis. The obtained segmentation results suggest promising directions for future research, especially in the early detection of thyroid nodules. Deploying this algorithm on mobile devices offers a practical solution for early diagnosis and is likely to improve patient outcomes.
- Research Article
9
- 10.3389/fpls.2024.1305358
- Mar 11, 2024
- Frontiers in Plant Science
Early detection of leaf diseases is necessary to control the spread of plant diseases, and one of the important steps is the segmentation of leaf and disease images. The uneven light and leaf overlap in complex situations make segmentation of leaves and diseases quite difficult. Moreover, the significant differences in ratios of leaf and disease pixels results in a challenge in identifying diseases. To solve the above issues, the residual attention mechanism combined with atrous spatial pyramid pooling and weight compression loss of UNet is proposed, which is named RAAWC-UNet. Firstly, weights compression loss is a method that introduces a modulation factor in front of the cross-entropy loss, aiming at solving the problem of the imbalance between foreground and background pixels. Secondly, the residual network and the convolutional block attention module are combined to form Res_CBAM. It can accurately localize pixels at the edge of the disease and alleviate the vanishing of gradient and semantic information from downsampling. Finally, in the last layer of downsampling, the atrous spatial pyramid pooling is used instead of two convolutions to solve the problem of insufficient spatial context information. The experimental results show that the proposed RAAWC-UNet increases the intersection over union in leaf and disease segmentation by 1.91% and 5.61%, and the pixel accuracy of disease by 4.65% compared with UNet. The effectiveness of the proposed method was further verified by the better results in comparison with deep learning methods with similar network architectures.
- Research Article
7
- 10.3389/fpls.2023.1139666
- Dec 8, 2023
- Frontiers in Plant Science
Due to the unique structure of coconuts, their cultivation heavily relies on manual experience, making it difficult to accurately and timely observe their internal characteristics. This limitation severely hinders the optimization of coconut breeding. To address this issue, we propose a new model based on the improved architecture of Deeplab V3+. We replace the original ASPP(Atrous Spatial Pyramid Pooling) structure with a dense atrous spatial pyramid pooling module and introduce CBAM(Convolutional Block Attention Module). This approach resolves the issue of information loss due to sparse sampling and effectively captures global features. Additionally, we embed a RRM(residual refinement module) after the output level of the decoder to optimize boundary information between organs. Multiple model comparisons and ablation experiments are conducted, demonstrating that the improved segmentation algorithm achieves higher accuracy when dealing with diverse coconut organ CT(Computed Tomography) images. Our work provides a new solution for accurately segmenting internal coconut organs, which facilitates scientific decision-making for coconut researchers at different stages of growth.
- New
- Research Article
- 10.3390/app152111838
- Nov 6, 2025
- Applied Sciences
- New
- Research Article
- 10.3390/app152111830
- Nov 6, 2025
- Applied Sciences
- New
- Research Article
- 10.3390/app152111846
- Nov 6, 2025
- Applied Sciences
- New
- Research Article
- 10.3390/app152111831
- Nov 6, 2025
- Applied Sciences
- New
- Research Article
- 10.3390/app152111823
- Nov 6, 2025
- Applied Sciences
- New
- Research Article
- 10.3390/app152111833
- Nov 6, 2025
- Applied Sciences
- New
- Research Article
- 10.3390/app152111845
- Nov 6, 2025
- Applied Sciences
- New
- Research Article
- 10.3390/app152111840
- Nov 6, 2025
- Applied Sciences
- New
- Research Article
- 10.3390/app152111825
- Nov 6, 2025
- Applied Sciences
- New
- Research Article
- 10.3390/app152111820
- Nov 6, 2025
- Applied Sciences
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.