YOLO-pineapple: enhanced pineapple detection in UAV images using an optimized YOLOv8 model
YOLO-pineapple: enhanced pineapple detection in UAV images using an optimized YOLOv8 model
- Conference Article
6
- 10.1145/3456415.3456424
- Feb 25, 2021
In order to better manage and protect rivers and lakes, the most important requirement is to find the objects on the surface of rivers and lakes in time. Generally, image segmentation and target detection are used to detect water surface targets. The former is sensitive to the selection of target features, with poor generalization ability and slow detection speed. The latter has not yet been applied to surface target detection in UAV images. In view of this situation, this paper proposes a target detection model based on YOLOv3, which is used to detect surface targets in UAV images. In order to verify the performance of the model, the images collected in this paper include five types of surface targets. These images are then enhanced by rotation transform, brightness transform and mirror transform, and the enhanced images are used to generate data sets. In the YOLOv3 model, we use the inception module for multi-scale depth features to process the deep features of the network. The module can activate the multi-scale sensing field of the deep features, so as to fully utilize the deep features and improve the detection accuracy of small and medium targets in the UAV image. In addition, we optimize the loss function to train the network better. The experimental results show that the mAP of the proposed Yolov3-inception is 81%, the detection speed is 23 frames per second, and the overall performance is better than YOLOv3, Faster RCNN and SSD. Therefore, this method is suitable for surface target detection in UAV images.
- Conference Article
125
- 10.1109/wacv48630.2021.00330
- Jan 1, 2021
Existing methods for object detection in UAV images ignored an important challenge - imbalanced class distribution in UAV images - which leads to poor performance on tail classes. We systematically investigate existing solutions to long-tail problems and unveil that re-balancing methods that are effective on natural image datasets cannot be trivially applied to UAV datasets. To this end, we rethink longtailed object detection in UAV images and propose the Dual Sampler and Head detection Network (DSHNet), which is the first work that aims to resolve long-tail distribution in UAV images. The key components in DSHNet include Class-Biased Samplers (CBS) and Bilateral Box Heads (BBH), which are developed to cope with tail classes and head classes in a dual-path manner. Without bells and whistles, DSHNet significantly boosts the performance of tail classes on different detection frameworks. Moreover, DSHNet significantly outperforms base detectors and generic approaches for long-tail problems on VisDrone and UAVDT datasets. It achieves new state-of-the-art performance when combining with image cropping methods. Code is available at https://github.com/we1pingyu/DSHNet.
- Research Article
- 10.25165/j.ijabe.20241705.8069
- Jan 1, 2024
- International Journal of Agricultural and Biological Engineering
Robust, accurate, and fast monitoring of residual plastic film (RPF) pollution in farmlands has great significance. Based on CBAM-DBNet, this study proposed a threshold-adaptive joint framework for identifying the RPF on farmland surfaces and estimating its coverage rate. UAV imaging was used to gather images of the RPF from several locations with various soil backgrounds. RPFs were manually labeled, and the degree of RPF pollution was defined based on the RPF coverage rate. Combining differentiable binarization network (DBNet) with the convolutional block attention module (CBAM), whose feature extraction module was improved. A dynamic adaptive binarization threshold formula was defined for segmenting the RPF’s approximate binary map. Regarding the RPF image detection branch, the CBAM-DBNet exhibited a precision (P) value of 85.81%, a recall (R) value of 82.69%, and an F1-score (F1) value of 84.22%, which was 1.09 percentage points higher than the DBNet in the comprehensive index F1 value. For the RPF image segmentation branch, using CBAM-DBNet to segment the RPF image combined with an adaptive binarization threshold formula. Subsequently, the mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE) of the prediction of RPF’s coverage rate were 0.276, 0.366, and 0.605, respectively, outperforming the DBNet and the Iterative Threshold method. This study provides a theoretical reference for the further development of evaluation technology for RPF pollution based on UAV imaging. Keywords: binarization threshold adaptive, residual plastic film, object detection, image segmentation, UAV remote sensing DOI: 10.25165/j.ijabe.20241705.8069 Citation: Xiong L J, Hu C, Wang X F, Wang H B, Tang X Y, Wang X W. Detection and threshold-adaptive segmentation of farmland residual plastic film images based on CBAM-DBNet. Int J Agric & Biol Eng, 2024; 17(5): 231-238.
- Research Article
1
- 10.3389/fpls.2025.1526142
- Aug 28, 2025
- Frontiers in Plant Science
BackgroundAccurate sorghum spike detection is critical for monitoring growth conditions, accurately predicting yield, and ensuring food security. Deep learning models have improved the accuracy of spike detection thanks to advances in artificial intelligence. However, the dense distribution of sorghum spikes, variable sizes and complex background information in UAV images make detection and counting difficult.MethodsWe propose a multiscale and oriented sorghum spike detection and counting model in UAV images (MOSSNet). The model creates a Deformable Convolution Spatial Attention (DCSA) module to improve the network's ability to capture small sorghum spike features. It also integrated Circular Smooth Labels (CSL) to effectively represent morphological features. The model also employs a Wise IoU-based localization loss function to improve network loss. ResultsResults show that MOSSNet accurately counts sorghum spike under field conditions, achieving mAP of 90.3%. MOSSNet shows excellent performance in predicting spike orientation, with RMSEa and MAEa of 14.6 and 12.5 respectively, outperforming other directional detection algorithms. Compared to general object detection algorithms which output horizonal detection boxes, MOSSNet also demonstrates high efficiency in counting sorghum spikes, with RMSE and MAE values of 9.3 and 8.1, respectively.DiscussionSorghum spikes have a slender morphology and their orientation angles tend to be highly variable in natural environments. MOSSNet 's ability has been proved to handle complex scenes with dense distribution, strong occlusion, and complicated background information. This highlights its robustness and generalizability, making it an effective tool for sorghum spike detection and counting. In the future, we plan to further explore the detection capabilities of MOSSNet at different stages of sorghum growth. This will involve implementing object model improvements tailored to each stage and developing a real-time workflow for accurate sorghum spike detection and counting.
- Research Article
23
- 10.3390/rs17040685
- Feb 17, 2025
- Remote Sensing
Target detection in UAV images is of great significance in fields such as traffic safety, emergency rescue, and environmental monitoring. However, images captured by UAVs usually have multi-scale features, complex backgrounds, uneven illumination, and low target resolution, which makes target detection in UAV images very challenging. To tackle these challenges, this paper introduces SPDC-YOLO, a novel model built upon YOLOv8. In the backbone, the model eliminates the last C2f module and the final downsampling module, thus avoiding the loss of small target features. In the neck, this paper proposes a novel feature pyramid, SPC-FPN, which employs the SBA (Selective Boundary Aggregation) module to fuse features from two distinct scales. In the head, the P5 detection head is eliminated, and a new detection head, Dyhead-DCNv4, is proposed, replacing DCNv2 in the original Dyhead with DCNv4 and utilizing three attention mechanisms for dynamic feature weighting. In addition, the model uses the CGB (Context Guided Block) module for downsampling, which can learn and fuse local features with surrounding contextual information, and the PPA (Parallelized Patch-Aware Attention) module replacing the original C2f module to further improve feature expression capability. Finally, SPDC-YOLO adopts EIoU as the loss function to optimize target localization accuracy. On the public dataset VisDrone2019, the experimental results show that SPDC-YOLO improves mAP50 by 3.4% compared to YOLOv8n while reducing the parameters count by 1.03 M. Compared with other related methods, SPDC-YOLO demonstrates better performance.
- Conference Article
18
- 10.1109/ipta.2019.8936091
- Nov 1, 2019
With the raise of the world population, increasing agricultural productivity has become a necessity for farmers. One way to reduce the cost of chemicals and environmental impact is to allocate the right doses of herbicide to the right place and at the right time (precision agriculture). Nowadays, automatic weeds detection is one of the most challenging problem for precision agriculture. However, weeds and crop are hard to discriminate because of their strong similarities. One of the approaches used for weed detection is machine learning. The main common point between machine learning algorithms is the need of training data. In this article we propose to use deep features and one-class classification on unsupervised data for weed detection in UAV images. The results show that one-class classification can be comparable to the literature and also to a deep learning model trained with supervised training data labeling. Results obtained on all test datasets can be up to 90% depending on the data used for the training.
- Research Article
23
- 10.1109/jstars.2024.3373231
- Jan 1, 2024
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Object detection in UAV images is an important and challenging task for many applications, which often needs highly efficient detection algorithms to meet the accuracy and real-time requirements of the applications. In this paper, we investigate efficient mechanisms for detecting dense and small objects in UAV images. Specifically, 1) kernel K-means is used to obtain optimal anchors for dense and small object detection; 2) a spatial information enhancement module (SIE) is proposed to improve the detection accuracy of dense objects by extracting object spatial location information; 3) a Coord_C3 module is proposed to improve the receptive field of the network and to reduce the number of network parameters; 4) a small detection head is added in the Head of network and skip connections are employed in the Neck of network to improve the detection accuracy of small objects. Experimental results on the VisDrone2019, LEVIR-ship and Stanford Drone datasets show that our method not only has higher detection accuracy, but also runs faster compared to state-of-the-art detection methods.
- Research Article
1
- 10.1038/s41598-025-19145-w
- Sep 29, 2025
- Scientific reports
In UAV-based downstream tasks, intelligent interpretation of UAV images demands higher real-time performance and accuracy. However, achieving high-precision, real-time object detection in UAV images poses significant challenges due to the prevalence of small objects (e.g., persons and bicycles), uneven target distribution, occlusion, and other factors. Current UAV object detection algorithms lack comprehensive solutions to the multifaceted challenges encountered in real-world deployment scenarios, resulting in suboptimal performance. Moreover, direct application of mainstream real-time detection algorithms like the YOLO series to UAV images lead to a significant performance drop. To address these issues, this paper presents an enhanced real-time object detection network named YOLO-UD, which is built upon the YOLO11 architecture. Our approach aims to achieve superior feature representation through the effective integration of contextual information and adaptive multi-scale fusion. Specifically, we incorporate a novel C3kHR module, which employs dilated convolutions with varying rates to capture contextual information across multiple granularity hierarchy, enabling superior and richer multi-scale feature representation. Additionally, an efficient adaptive feature fusion network (EAFN) is designed to filter and prioritize key information from multi-scale feature layers and flexibly provide the detection head with the information needed for the detection process. A small object detection layer (SMDL) is also introduced to enhance the detection of small objects and provide rich information about small targets. Finally, extensive experiments on the VisDrone2019 and UAVDET datasets demonstrate that YOLO-UD achieves excellent balance between accuracy and inference speed, validating its effectiveness.
- Research Article
59
- 10.1016/j.isprsjprs.2023.04.009
- Apr 25, 2023
- ISPRS Journal of Photogrammetry and Remote Sensing
OGMN: Occlusion-guided multi-task network for object detection in UAV images
- Research Article
62
- 10.1016/j.compag.2022.107087
- Jun 7, 2022
- Computers and Electronics in Agriculture
A deep learning method for oriented and small wheat spike detection (OSWSDet) in UAV images
- Research Article
200
- 10.3390/rs13163095
- Aug 5, 2021
- Remote Sensing
Deep-learning-based object detection algorithms have significantly improved the performance of wheat spike detection. However, UAV images crowned with small-sized, highly dense, and overlapping spikes cause the accuracy to decrease for detection. This paper proposes an improved YOLOv5 (You Look Only Once)-based method to detect wheat spikes accurately in UAV images and solve spike error detection and miss detection caused by occlusion conditions. The proposed method introduces data cleaning and data augmentation to improve the generalization ability of the detection network. The network is rebuilt by adding a microscale detection layer, setting prior anchor boxes, and adapting the confidence loss function of the detection layer based on the IoU (Intersection over Union). These refinements improve the feature extraction for small-sized wheat spikes and lead to better detection accuracy. With the confidence weights, the detection boxes in multiresolution images are fused to increase the accuracy under occlusion conditions. The result shows that the proposed method is better than the existing object detection algorithms, such as Faster RCNN, Single Shot MultiBox Detector (SSD), RetinaNet, and standard YOLOv5. The average accuracy (AP) of wheat spike detection in UAV images is 94.1%, which is 10.8% higher than the standard YOLOv5. Thus, the proposed method is a practical way to handle the spike detection in complex field scenarios and provide technical references for field-level wheat phenotype monitoring.
- Research Article
27
- 10.1080/03772063.2021.1962418
- Aug 17, 2021
- IETE Journal of Research
In this paper, an ensemble deep transfer learning (EDTL) based on Faster R-CNN is introduced for the vehicle detection in UAV images. We perform a weighted-averaging ensemble transfer learning comprising six base learners using a ResNet50 that have already pre-trained on ImageNet dataset. The weights of the six base learners as well as the final decision threshold are adaptively optimized via genetic algorithm, to maximize the total accuracy, precision, and recall. Simulation results on AU-AIR dataset demonstrate the superiority of the EDTL against the existing techniques, in terms of the total accuracy, and the trade-off between precision and recall.
- Research Article
10
- 10.3390/s22239171
- Nov 25, 2022
- Sensors
The large view angle and complex background of UAV images bring many difficulties to the detection of small pedestrian targets in images, which are easy to be detected incorrectly or missed. In addition, the object detection models based on deep learning are usually complex and the high computational resource consumption limits the application scenarios. For small pedestrian detection in UAV images, this paper proposes an improved YOLOv5 method to improve the detection ability of pedestrians by introducing a new small object feature detection layer in the feature fusion layer, and experiments show that the improved method can improve the average precision by 4.4%, which effectively improves the pedestrian detection effect. To address the problem of high computational resource consumption, the model is compressed using channel pruning technology to reduce the consumption of video memory and computing power in the inference process. Experiments show that the model can be compressed to 11.2 MB and the GFLOPs of the model are reduced by 11.9% compared with that before compression under the condition of constant inference accuracy, which is significant for the deployment and application of the model.
- Research Article
26
- 10.3390/s22186993
- Sep 15, 2022
- Sensors (Basel, Switzerland)
UAV-based object detection has recently attracted a lot of attention due to its diverse applications. Most of the existing convolution neural network based object detection models can perform well in common object detection cases. However, due to the fact that objects in UAV images are spatially distributed in a very dense manner, these methods have limited performance for UAV-based object detection. In this paper, we propose a novel transformer-based object detection model to improve the accuracy of object detection in UAV images. To detect dense objects competently, an advanced foreground enhancement attention Swin Transformer (FEA-Swin) framework is designed by integrating context information into the original backbone of a Swin Transformer. Moreover, to avoid the loss of information of small objects, an improved weighted bidirectional feature pyramid network (BiFPN) is presented by designing the skip connection operation. The proposed method aggregates feature maps from four stages and keeps abundant information of small objects. Specifically, to balance the detection accuracy and efficiency, we introduce an efficient neck of the BiFPN network by removing a redundant network layer. Experimental results on both public datasets and a self-made dataset demonstrate the performance of our method compared to the state-of-the-art methods in terms of detection accuracy.
- Research Article
8
- 10.3390/agriculture15151653
- Jul 31, 2025
- Agriculture
Accurate detection of maize tassels plays a crucial role in yield estimation of maize in precision agriculture. Recently, UAV and deep learning technologies have been widely introduced in various applications of field monitoring. However, complex field backgrounds pose multiple challenges against the precision detection of maize tassels, including maize tassel multi-scale variations caused by varietal differences and growth stage variations, intra-class occlusion, and background interference. To achieve accurate maize tassel detection in UAV images under complex field backgrounds, this study proposes an MSMT-RTDETR detection model. The Faster-RPE Block is first designed to enhance multi-scale feature extraction while reducing model Params and FLOPs. To improve detection performance for multi-scale targets in complex field backgrounds, a Dynamic Cross-Scale Feature Fusion Module (Dy-CCFM) is constructed by upgrading the CCFM through dynamic sampling strategies and multi-branch architecture. Furthermore, the MPCC3 module is built via re-parameterization methods, and further strengthens cross-channel information extraction capability and model stability to deal with intra-class occlusion. Experimental results on the MTDC-UAV dataset demonstrate that the MSMT-RTDETR significantly outperforms the baseline in detecting maize tassels under complex field backgrounds, where a precision of 84.2% was achieved. Compared with Deformable DETR and YOLOv10m, improvements of 2.8% and 2.0% were achieved, respectively, in the mAP50 for UAV images. This study proposes an innovative solution for accurate maize tassel detection, establishing a reliable technical foundation for maize yield estimation.