Quantifying Greenspace with Satellite Images in Karachi, Pakistan, Using a New Data Augmentation Paradigm
Greenspaces in communities are critical for mitigating effects of climate change and have important impacts on health. Today, the availability of satellite imagery data combined with deep learning methods allows for automated greenspace analysis at high resolution. We propose a novel green color augmentation for deep learning model training to better detect and delineate types of greenspace (trees, grass) with satellite imagery. Our method outperforms gold standard methods, which use vegetation indices, by 33.1% (accuracy) and 77.7% Intersection over Union (IoU). The proposed augmentation technique also shows improvement over state-of-the-art deep learning based methods by 13.4% (IoU) and 3.11% (accuracy) for greenspace segmentation. We apply the method to high-resolution (0.27 m /pixel) satellite images covering Karachi, Pakistan, and illuminate an important need; Karachi has 4.17 m 2 of greenspace per capita, which significantly lags World Health Organization recommendations. Moreover, greenspaces in Karachi are often in areas of economic development (Pearson’s correlation coefficient shows a 0.352 correlation between greenspaces and roads, p < 0.001), and correspond to higher land surface temperature in localized areas. Our greenspace analysis and how it relates to infrastructure and climate is relevant to urban planners, public health and government professionals, and ultimately the public, for improved allocation and development of greenspaces.
- Conference Article
11
- 10.1109/iccabs.2018.8542071
- Oct 1, 2018
Automatic segmentation and localization of lesions in mammogram (MG) images are challenging problems even with employing advanced methods such as deep learning (DL) methods [1]–[3]. To address these challenges, we propose to use a U-Net approach to automatically detect and segment lesions in MG images. U-Net [4] is an end-to-end convolutional neural network (CNN) based model that has achieved remarkable results in segmenting bio-medical images [5]. We modified the architecture of the U-Net model to maximize its precision such as using batch normalization, adding dropout, and data augmentations. The proposed U-Net model predicts a pixel-wise segmentation map of an input full MG image in an efficient way due to its architecture. These pixel-wise segmentation maps help radiologists in differentiating benign and malignant lesions depend on the lesion shapes. The main challenge that most DL methods face in mammography is the need for large annotated training data-sets. To train such DL networks without over-fitting, these networks need thousands or millions of training MG images [1], [3], [5]. In contrast, U-Net is capable of learning from a relatively small training data-set compared to other DL methods [4]. We used publicly available databases, (CBIS-DDSM, BCDR-01, and INbreast), and MG images from the University of Connecticut Health Center (UCHC) to train the proposed U-Net model [3]. The proposed U-Net method is trained on MG images that have mass lesions of different sizes, shapes, margins, and intensity variation around mass boundaries. All the training MG images containing suspicious areas are accompanied by associated pixel-level ground truth maps (GTMs) which indicate the background and breast lesion labels for each pixel. A total of 2066 MG images and their corresponding segmentation GTMs are used to train the proposed U-Net model. Moreover, we applied the adaptive median filter (AMF) and the contrast limited adaptive histogram equalization (CLAHE) filter to the training MG images to enhance its characteristics and improve the performance of the downstream analysis [3].We compared the efficiency of our model with those of the state-of-the-art Faster R-CNN model [6] and the region growing (RG) model [7]. We tested our proposed U-Net method using film-based and fully digitized MG images. The proposed U-Net model shows slightly better performance in detecting true segments compared to the Faster R-CNN model but outperforms it significantly in term of runtime. In addition, the proposed U-Net model gives precise segments of the lesions in the MG images. In contrast, the Faster R-CNN method gives bounding boxes surrounding the lesions. Moreover, the proposed U-Net method performs superior compared to the RG model. Data augmentation has been very effective in our experiments, resulting in an increase in the Dice similarity coefficient from 0.918 to 0.983, between the GTMs and the segmented lesions maps. Also, the proposed model yielded an Intersection over Union (IoU) of 0.974 compared to IoU of 0.966 from the state-of-the-art Faster R-CNN model. In conclusion, the performance of the proposed DL model show promises to make its practical application possible for clinical applications to assist radiologists.
- Research Article
21
- 10.1080/07038992.2021.1915756
- May 4, 2021
- Canadian Journal of Remote Sensing
This paper investigates the deep neural networks for rapid and accurate detection of building rooftops in aerial orthoimages. The networks were trained using the manually labeled rooftop vector data digitized on aerial orthoimagery covering the Kitchener-Waterloo area. The performance of the three deep learning methods, U-Net, Fully Convolutional Network (FCN), and Deeplabv3+ were compared by training, validation, and testing sets in the dataset. Our results demonstrated that DeepLabv3+ achieved 63.8% in Intersection over Union (IoU), 77.8% in mean IoU (mIoU), 74% in precision, and 78% in F1-score. After improving the performance with focal loss, training loss was greatly cut down and the convergence rate experienced a significant growth. Meanwhile, rooftop detection also achieved higher performance, as Deeplabv3+ reached 93.6% in average pixel accuracy, with 65.4% in IoU, 79.0% in mIoU, 77.6% in precision, and 79.1% in F1-score. Lastly, in order to evaluate the effects of data volume, by changing data volume from 100% to 75% and 50% in ablation study, it shows that when data volume decreased, the performance of extraction also got worse, with IoU, mIoU, precision, and F1-score also mostly decreased.
- Research Article
67
- 10.1016/j.compag.2023.107956
- Jun 3, 2023
- Computers and Electronics in Agriculture
Weeds are one of the most detrimental challenges to agriculture causing significant losses to beneficial yield potentials as they compete with crops for water, nutrients and sunlight. Early detection of weeds in the field is highly critical for taking appropriate actions, such as applying herbicides, mechanical removal or other remedial treatments, which are less effective or more resource intensive at later stages of crop growth. In this work, a deep learning method has been developed for weed detection. A sunflower dataset comprising multispectral images, the visible band (400–700 nm wavelengths in RGB) and Near Infrared (700–1000 nm wavelengths, NIR), captured at various days and times were used for the study. The deep learning model, ‘U-Net’, was trained with images from the cotyledon emergence through to the subsequent growth stages and tested on images of crops in the last stage of growth, where chemical treatments can be applied. The results of the U-Net were further enhanced by employing conditional random fields to achieve improved segmentation in terms of Intersection over Union (IoU). The proposed method using the Green (530–600 nm wavelengths) + Filtered-NIR + Normalised Difference Vegetation Index (NDVI) (Weier and Herring, Aug. 2000) channels as the input achieved the best mean IoU score of 0.883 on images of 512 × 512 pixels. In the same experiment, soil, crop and weed pixels were correctly predicted with 0.990 IoU, 0.906 IoU, and 0.753 IoU scores, respectively. The results show that the chosen input and the proposed methodology offer a viable approach for early-stage weed detection.
- Research Article
24
- 10.3390/ani13111861
- Jun 2, 2023
- Animals : an Open Access Journal from MDPI
Simple SummaryTimely detection of dead chickens is of great importance on commercial farms. Using multi-source images for dead chicken detection can theoretically achieve higher accuracy and robustness compared with single-source images. In this study, we introduced a pixel-level image registration method to align the near-infrared (NIR), thermal infrared (TIR), and depth images and analyzed the detection performance of models using different source images. The results of the study showed the following: The model with the NIR image performed the best among models with single-source images, and the models with dual-source images performed better than that with single-source images. The model with the TIR-NIR image or the NIR-depth image performed better than the model with the TIR-depth image. The detection performance with the TIR-NIR-Depth image was better than that with single-source images but was not significantly different from that with the TIR-NIR or NIR-depth images. This study provided a reference for selecting and using multi-source images for detecting dead laying hens on commercial farms.In large-scale laying hen farming, timely detection of dead chickens helps prevent cross-infection, disease transmission, and economic loss. Dead chicken detection is still performed manually and is one of the major labor costs on commercial farms. This study proposed a new method for dead chicken detection using multi-source images and deep learning and evaluated the detection performance with different source images. We first introduced a pixel-level image registration method that used depth information to project the near-infrared (NIR) and depth image into the coordinate of the thermal infrared (TIR) image, resulting in registered images. Then, the registered single-source (TIR, NIR, depth), dual-source (TIR-NIR, TIR-depth, NIR-depth), and multi-source (TIR-NIR-depth) images were separately used to train dead chicken detecting models with object detection networks, including YOLOv8n, Deformable DETR, Cascade R-CNN, and TOOD. The results showed that, at an IoU (Intersection over Union) threshold of 0.5, the performance of these models was not entirely the same. Among them, the model using the NIR-depth image and Deformable DETR achieved the best performance, with an average precision (AP) of 99.7% (IoU = 0.5) and a recall of 99.0% (IoU = 0.5). While the IoU threshold increased, we found the following: The model with the NIR image achieved the best performance among models with single-source images, with an AP of 74.4% (IoU = 0.5:0.95) in Deformable DETR. The performance with dual-source images was higher than that with single-source images. The model with the TIR-NIR or NIR-depth image outperformed the model with the TIR-depth image, achieving an AP of 76.3% (IoU = 0.5:0.95) and 75.9% (IoU = 0.5:0.95) in Deformable DETR, respectively. The model with the multi-source image also achieved higher performance than that with single-source images. However, there was no significant improvement compared to the model with the TIR-NIR or NIR-depth image, and the AP of the model with multi-source image was 76.7% (IoU = 0.5:0.95) in Deformable DETR. By analyzing the detection performance with different source images, this study provided a reference for selecting and using multi-source images for detecting dead laying hens on commercial farms.
- Research Article
19
- 10.3390/rs13234759
- Nov 24, 2021
- Remote Sensing
Deep learning is a promising method for image classification, including satellite images acquired by various sensors. However, the synergistic use of geospatial data for water body extraction from Sentinel-1 data using deep learning and the applicability of existing deep learning models have not been thoroughly tested for operational flood monitoring. Here, we present a novel water body extraction model based on a deep neural network that exploits Sentinel-1 data and flood-related geospatial datasets. For the model, the U-Net was customised and optimised to utilise Sentinel-1 data and other flood-related geospatial data, including digital elevation model (DEM), Slope, Aspect, Profile Curvature (PC), Topographic Wetness Index (TWI), Terrain Ruggedness Index (TRI), and Buffer for the Southeast Asia region. Testing and validation of the water body extraction model was applied to three Sentinel-1 images for Vietnam, Myanmar, and Bangladesh. By segmenting 384 Sentinel-1 images, model performance and segmentation accuracy for all of the 128 cases that the combination of stacked layers had determined were evaluated following the types of combined input layers. Of the 128 cases, 31 cases showed improvement in Overall Accuracy (OA), and 19 cases showed improvement in both averaged intersection over union (IOU) and F1 score for the three Sentinel-1 images segmented for water body extraction. The averaged OA, IOU, and F1 scores of the ‘Sentinel-1 VV’ band are 95.77, 80.35, and 88.85, respectively, whereas those of ‘band combination VV, Slope, PC, and TRI’ are 96.73, 85.42, and 92.08, showing improvement by exploiting geospatial data. Such improvement was further verified with water body extraction results for the Chindwin river basin, and quantitative analysis of ‘band combination VV, Slope, PC, and TRI’ showed an improvement of the F1 score by 7.68 percent compared to the segmentation output of the ‘Sentinel-1 VV’ band. Through this research, it was demonstrated that the accuracy of deep learning-based water body extraction from Sentinel-1 images can be improved up to 7.68 percent by employing geospatial data. To the best of our knowledge, this is the first work of research that demonstrates the synergistic use of geospatial data in deep learning-based water body extraction over wide areas. It is anticipated that the results of this research could be a valuable reference when deep neural networks are applied for satellite image segmentation for operational flood monitoring and when geospatial layers are employed to improve the accuracy of deep learning-based image segmentation.
- Research Article
45
- 10.3389/fcomp.2023.1235622
- Sep 7, 2023
- Frontiers in Computer Science
IntroductionKidney tumors are common cancer in advanced age, and providing early detection is crucial. Medical imaging and deep learning methods are increasingly attractive for identifying and segmenting kidney tumors. Convolutional neural networks have successfully classified and segmented images, enabling clinicians to recognize and segment tumors effectively. CT scans of kidneys aid in tumor assessment and morphology study, using semantic segmentation techniques for pixel-level identification of kidney and surrounding anatomy. Accurate diagnostic procedures are crucial for early detection of kidney cancer.MethodsThis paper proposes an EfficientNet model for complex segmentation by linking the encoder stage EfficientNet with U-Net. This model represents a more successful system with improved encoder and decoder features. The Intersection over Union (IoU) metric quantifies model performance.Results and DiscussionThe EfficientNet models showed high IoU_Scores for background, kidney, and tumor segmentation, with mean IoU_Scores ranging from 0.976 for B0 to 0.980 for B4. B7 received the highest IoU_Score for segmenting kidneys, while B4 received the highest for segmenting tumors. The study utilizes the KiTS19 dataset for contrast-enhanced CT images. Using Semantic segmentation for EfficientNet Family U-Net Models, our method proved even more reliable and will aid doctors in accurate tumor detection and image classification for early diagnosis.
- Research Article
1
- 10.13287/j.1001-9332.202304.003
- Apr 1, 2023
- Ying yong sheng tai xue bao = The journal of applied ecology
As one of the important timber species in China, Cunninghamia lanceolata is widely distributed in southern China. The information of tree individuals and crown plays an important role in accurately monitoring forest resources. Therefore, it is particularly significant to accurately grasp such information of individual C. lanceolata tree. For high-canopy closed forest stands, the key to correctly extract such information is whether the crowns of mutual occlusion and adhesion can be accurately segmented. Taking the Fujian Jiangle State-owned Forest Farm as the research area and using the UAV image as the data source, we developed a method to extract crown information of individual tree based on deep learning method and watershed algorithm. Firstly, the deep learning neural network model U-Net was used to segment the coverage area of the canopy of C. lanceolata, and then the traditional image segmentation algorithm was used to segment the individual tree to obtain the number and crown information of individual tree. Under the condition of maintaining the same training set, validation set and test set, the extraction results of the canopy coverage area by the U-Net model and traditional machine learning methods [random forest (RF) and support vector machine (SVM)] were compared. Then, two individual tree segmentation results were compared, one using the marker-controlled watershed algorithm, and the other using the combination of the U-Net model and marker-controlled watershed algorithm. The results showed that the segmentation accuracy (SA), precision, IoU (intersection over union) and F1-score (harmonic mean of precision and recall) of the U-Net model were higher than those of RF and SVM. Compared with RF, the value of those four indicators increased by 4.6%, 14.9%, 7.6% and 0.05, respectively. Compared with SVM, the four indicators increased by 3.3%, 8.5%, 8.1% and 0.05, respectively. In terms of extracting the number of trees, the overall accuracy (OA) of the U-Net model combined with the marker-controlled watershed algorithm was 3.7% higher than that of the marker-controlled watershed algorithm, with the mean absolute error (MAE) being decreased by 3.1%. In terms of extracting crown area and crown width of individual tree, R2 increased by 0.11 and 0.09, mean squared error decreased by 8.49 m2 and 4.27 m, and MAE decreased by 2.93 m2 and 1.72 m, respectively. The combination of deep learning U-Net model and watershed algorithm could overcome the challenges in accurately extracting the number of trees and the crown information of individual tree of high-density pure C. lanceolata plantations. It was an efficient and low-cost method of extracting tree crown parameters, which could provide a basis for developing intelligent forest resource monitoring.
- Research Article
- 10.12732/ijam.v38i3s.725
- Oct 13, 2025
- International Journal of Applied Mathematics
Satellite images play a vital role in environmental and climate monitoring, modelling hydrological systems, and managing disasters, however accurate segmentation of water bodies from satellite imagery remains challenging due to variations in resolution, spectral characteristics, and climate conditions. A number of studies have been proposed for segmentation approach, there is still lack of systematic comparative study of performance of different approaches across different datasets and evaluating their readiness for real-time operational applications. This systematic literature review (SLR) studies gradual advancements in water body segregation from satellite images over the past decade (2015–2024). A total 1,627 research paper and articles were identified in initial stage by spanning through seven major databases, out of which 48 high quality, open-access studies met our inclusion and quality assessment criteria. This review specifically categorizes water segregation approaches into conventional image processing approaches, modern approaches such as machine learning, deep learning, and multimodal hybrid approaches. Deep learning methods, such as variants of U-net and attention-based models, consistently overpower traditional methods, accomplishing average Intersection over Union (IoU) above 92% and precision exceeding 95% on datasets such as Sentinel-2 and Landsat-8, whereas, threshold-based and index-based methods achieved average IoUs of 78–85%, reflecting limitations in complex backgrounds. The most frequently used datasets for water segmentation are Sentinel-2, Landsat-8, Sentinel-1 SAR, WorldView, and GaoFen-2. These datasets were specifically selected by researcher and scholars for their work because they had good spatial resolution, spectral diversity, and availability. Apart from accuracy trends, the review highlights and elaborates emerging and promising techniques such as multimodal data amalgamation, lightweight deep networks for edge deployment, and Internet of Things (IoT) frameworks integrated with existing structure for real-time flood monitoring, creating smart water grids, and water management for various application like agriculture, power generation etc. The findings of this work provide a consolidated overview of segregation approaches and datasets, providing assistance for future research work leading to scalable, robust, and real-time water resource monitoring systems.
- Research Article
18
- 10.1109/access.2022.3196356
- Jan 1, 2022
- IEEE Access
Brachial plexus block is a common regional anesthesia method widely used in upper limb surgery. Nowadays, ultrasound-guided brachial plexus block has been extensively used in clinical anesthesia. However, accurate brachial plexus block is highly dependent on the physician’s experience, and a physician without extensive clinical experience may cause nerve injury when performing a nerve block. With the development of artificial intelligence technology, the deep learning method can automatically identify the brachial plexus in ultrasound images and assist doctors in completing the brachial plexus block accurately and quickly. In this paper, we aim to evaluate the performance of different deep learning models in identifying brachial plexus (i.e., segmentation of brachial plexus) from ultrasonic images to explore the best models and training strategies for this task. To this end, we use a new dataset containing 340 brachial plexus ultrasound images annotated by three experienced clinicians. Among the 12 deep learning models we evaluated, U-Net achieves the best segmentation accuracy, with an intersection over union (IoU) of 68.50%. However, the number of U-Net parameters is very large, and it can only process 15 images per second. Compared to U-Net, LinkNet can process 142 images per second and achieve the second-best segmentation accuracy with an IoU of 66.27%. It achieves the balance between segmentation accuracy and processing efficiency, which has a good potential for the brachial plexus’s real-time segmentation task.
- Research Article
- 10.55041/ijsrem48847
- May 27, 2025
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
Abstract— Satellite image segmentation is a core process in remote sensing applications that enables land cover classification, urban planning, and environmental monitoring. In this work, we introduce a deep learning based segmentation model based on the U-Net architecture for pixel-wise classification of high-resolution satellite images. The model is trained on a satellite image dataset and its respective labeled mask to learn geographical features properly. To enhance segmentation performance even further, we employ data augmentation and hyperparameter tuning to enhance generalization. The model is assessed based on the Intersection over Union (IoU) metric with an IoU metric score of approximately 0.8, which shows high segmentation accuracy. The experimental outcomes prove that the U-Net architecture is exceptionally suitable for satellite image segmentation and provides a promising approach to real-world remote sensing applications. Future work will explore further generalization enhancement by adding attention mechanisms and multi-scale feature fusion. Keywords—( Remote Sensing, Satellite Imagery, U-Net, Image Segmentation, Deep Learning, Feature Extraction, Semantic Segmentation)
- Conference Article
3
- 10.1109/icase54940.2021.9904133
- Dec 14, 2021
Agriculture field boundary information is vital in crop health monitoring, food security efforts, and precision agriculture. In countries like Denmark and the Netherlands field parcel information is available whereas Pakistan lacks such datasets.Denmark field boundary data for year 2018 was selected for the training of the model. Satellite imagery of four dates was downloaded and preprocessed to capture crop dynamics on the ground. Semantic segmentation architectures were used to train the models on the imagery, and results were assessed using metrics such as Intersection over Union(IoU) and f1-scores.The results show that UNet architecture with SENet154 backbone performs better than other architecture-backbone combinations. In terms of dates of imagery, data from 27th July achieved a higher IoU score. The method of providing input mask to the model had the most impact on the metrics and resulted in a 35% increase in IoU. Temporal stacking of multi-date satellite imagery proved to be an effective way of increasing information content for boundary delineation and improved the IoU by 6.5% in comparison to a single-date model. The final temporal stacked model had an IoU score of around 0.72.The trained model was able to delineate boundaries and showed good results in comparison to the available ground truth. The results of transfer learning to new areas suggest that there is potential in using such techniques, but further factors need to be considered to improve the metrics.
- Research Article
- 10.5194/isprs-archives-xlviii-4-w17-2025-431-2026
- Feb 2, 2026
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Solar energy has become a major contributor to global renewable energy strategies, offering a sustainable alternative to fossil fuels. Photovoltaic (PV) systems, which convert sunlight into electricity, play a central role in this transition. As the demand for large-scale solar energy projects grows, Geographic Information Systems (GIS) and advanced deep learning models have become critical for accurately detecting and mapping PV installations, particularly from satellite imagery. However, challenges remain, especially in regions with suboptimal satellite data quality. This study focuses on the Marrakesh-Safi region of Morocco, where the potential for solar energy is high but hindered by limitations in available satellite imagery. We employ advanced transformer-based models, including Mask2Former, SegFormer, and DeepLabV3+, to enhance the semantic segmentation of PV systems from high-resolution satellite images. By integrating GIS with these deep learning models, we aim to improve the accuracy and scalability of PV detection, even in complex and diverse geographical settings. Our methodology involves training and testing these models on annotated satellite imagery, with performance evaluated using key metrics such as Intersection over Union (IoU), precision, recall, and F1 score. Mask2Former achieved notable results with a recall of 0.95 and an F1 score of 0.936, excelling in the detection of smaller and more complex PV layouts. DeepLabV3+ demonstrated strong overall performance, with an IoU of 0.89 and precision of 0.93, while also being the most computationally efficient model, processing 28 samples per second. This research highlights the effectiveness of integrating GIS with deep learning, particularly transformer-based architectures, for the accurate detection and mapping of PV systems. The results contribute to the broader efforts in renewable energy optimization, supporting more efficient solar energy deployment, especially in regions like Morocco where data quality poses significant challenges.
- Research Article
22
- 10.1016/j.compag.2023.107862
- Apr 23, 2023
- Computers and Electronics in Agriculture
Detection and infected area segmentation of apple fire blight using image processing and deep transfer learning for site-specific management
- Research Article
21
- 10.1007/s12524-019-01064-9
- Nov 11, 2019
- Journal of the Indian Society of Remote Sensing
The paper proposes a new method for classifying the LISS IV satellite images using deep learning method. Deep learning method is to automatically extract many features without any human intervention. The classification accuracy through deep learning is still improved by including object-based segmentation. The object-based deep feature learning method using CNN is used to accurately classify the remotely sensed images. The method is designed with the technique of extracting the deep features and using it for object-based classification. The proposed system extracts deep features using pre-defined filter values, thus increasing the overall performance of the process compared to randomly initialized filter values. The object-based classification method can preserve edge information in complex satellite images. To improve the classification accuracy and to reduce complexity, object-based deep learning technique is used. The proposed object-based deep learning approach is used to drastically increase the classification accuracy. Here, the remotely sensed images were used to classify the urban areas of Ahmadabad and Madurai cities. Experimental results show a better performance with the object-based classification.
- Research Article
40
- 10.3389/fpls.2021.695749
- Jun 29, 2021
- Frontiers in Plant Science
The disease spots on the grape leaves can be detected by using the image processing and deep learning methods. However, the accuracy and efficiency of the detection are still the challenges. The convolutional substrate information is fuzzy, and the detection results are not satisfactory if the disease spot is relatively small. In particular, the detection will be difficult if the number of pixels of the spot is <32 × 32 in the image. In order to effectively address this problem, we present a super-resolution image enhancement and convolutional neural network-based algorithm for the detection of black rot on grape leaves. First, the original image is up-sampled and enhanced with local details using the bilinear interpolation. As a result, the number of pixels in the image increase. Then, the enhanced images are fed into the proposed YOLOv3-SPP network for detection. In the proposed network, the IOU (Intersection Over Union, IOU) in the original YOLOv3 network is replaced with GIOU (Generalized Intersection Over Union, GIOU). In addition, we also add the SPP (Spatial Pyramid Pooling, SPP) module to improve the detection performance of the network. Finally, the official pre-trained weights of YOLOv3 are used for fast convergence. The test set test_pv from the Plant Village and the test set test_orchard from the orchard field were used to evaluate the network performance. The results of test_pv show that the grape leaf black rot is detected by the YOLOv3-SPP with 95.79% detection accuracy and 94.52% detector recall, which is a 5.94% greater in terms of accuracy and 10.67% greater in terms of recall as compared to the original YOLOv3. The results of test_orchard show that the method proposed in this paper can be applied in field environment with 86.69% detection precision and 82.27% detector recall, and the accuracy and recall were improved to 94.05 and 93.26% if the images with the simple background. Therefore, the detection method proposed in this work effectively solves the detection task of small targets and improves the detection effectiveness of the grape leaf black rot.