Enhanced Detection of Small Objects in Aerial Imagery: A High-Resolution Neural Network Approach With Amplified Feature Pyramid and Sigmoid Re-Weighting
Detecting small objects in drone-captured images or aerial videos is challenging due to their minimal representation. As data traverses deep learning networks, the information about small objects can diminish, making high-resolution images essential for enhanced detection performance. However, high-resolution images increase computational load undesirably. Leveraging this fact, we propose a streamlined neural network designed specifically for small object detection in high-resolution images. The proposed network encompasses three main components: i) Enhanced High-Resolution Processing Module (EHRPM), ii) the Small Object Feature Amplified Feature Pyramid Network (SOFA-FPN) with its Edge Enhancement Module (EEM), Cross Lateral Connection Module (CLCM), and Dual Bottom-up Convolution Module (DBCM), and iii) the Sigmoid Re-weighting Module (SRM). Compared to several state-of-the-art networks, our method delivers superior performance with fewer parameters and a lower computational demand. The source code is available at https://github.com/datu0615/EHRPM.
- Research Article
5
- 10.3390/app132111760
- Oct 27, 2023
- Applied Sciences
To address the challenges of detecting a large number of objects and a high proportion of small objects in aerial drone imagery, we proposed an aerial dense small object detection algorithm called Global Normalization Attention Mechanism You Only Look Once (GNYL) based on the Global Normalization Attention Mechanism. In the backbone network of GNYL, we embedded a GNAM (Global Normalization Attention Mechanism) that explores channel attention features and spatial attention features from input features in a concatenated manner. It utilizes batch normalization’s scale factors to suppress irrelevant channels or pixels. Furthermore, the spatial attention sub-module introduces a three-dimensional arrangement with a multi-layer perceptron to reduce information loss and amplify global interaction representation. Finally, the computed attention weights are weighted to form the global normalized attention weights, which increases the utilization of effective information in input feature channels and spatial dimensions. We have optimized the backbone network, feature enhancement network, and detection heads to improve detection accuracy while ensuring a lightweight detection network. Specifically, we have added a small object detection layer to enhance the localization accuracy for the abundant small objects in aerial imagery. The algorithm’s performance was evaluated using the publicly available VisDrone2019 dataset. Compared to the baseline network YOLOv8l, GNYL achieved a 7.2% improvement in mAP0.5 and a 5.0% improvement in mAP0.95. Compared to CDNet, GNYL showed a 14.5% improvement in mAP0.5 and a 9.1% improvement in mAP0.95. These experimental results demonstrate the strong practicality of the GNYL object detection network for detecting dense small objects in the aerial imagery captured by unmanned aerial vehicles.
- Conference Article
10
- 10.1109/ubmk52708.2021.9558923
- Sep 15, 2021
Object detection and tracking from airborne imagery draws attention to the parallel development of UAV systems and computer vision technologies. Aerial imagery has its own unique challenges that differ from the training set of modern-day object detectors, since it is made of images of larger areas compared to the regular datasets and the objects are very small on the contrary. These problems do not allow us to use common object detection models. The main purpose of this paper is to make modifications to the Faster-RCNN (FRCNN) model, then leverage it for small object detection and tracking from the aerial imagery. It is aimed to use both spatial and temporal information from the image sequence, as appearance information alone is insufficient. The anchors in the Region Proposal Network (RPN) stage will be adjusted for small objects. Also, intersection over union (IoU) is optimized for small objects. After improving detection performance, The DeepSORT algorithm is inserted right after the Region of Interest (ROI Head) to track the objects. The results show that the proposed model has good performance on the VisDrone-2019 dataset. Detection performance becomes considerably better than the original FRCNN and the algorithms that are evaluated in the VisDrone-2019 VID challenge. After completing the proposed modifications, the AP-AP50 values reached 14.07-29.41 from 8.08-18.70, which means approximately 75% improvement.
- Research Article
26
- 10.3390/s22124339
- Jun 8, 2022
- Sensors (Basel, Switzerland)
One common issue of object detection in aerial imagery is the small size of objects in proportion to the overall image size. This is mainly caused by high camera altitude and wide-angle lenses that are commonly used in drones aimed to maximize the coverage. State-of-the-art general purpose object detector tend to under-perform and struggle with small object detection due to loss of spatial features and weak feature representation of the small objects and sheer imbalance between objects and the background. This paper aims to address small object detection in aerial imagery by offering a Convolutional Neural Network (CNN) model that utilizes the Single Shot multi-box Detector (SSD) as the baseline network and extends its small object detection performance with feature enhancement modules including super-resolution, deconvolution and feature fusion. These modules are collectively aimed at improving the feature representation of small objects at the prediction layer. The performance of the proposed model is evaluated using three datasets including two aerial images datasets that mainly consist of small objects. The proposed model is compared with the state-of-the-art small object detectors. Experiment results demonstrate improvements in the mean Absolute Precision (mAP) and Recall values in comparison to the state-of-the-art small object detectors that investigated in this study.
- Conference Article
4
- 10.1109/fit53504.2021.00014
- Dec 1, 2021
Super-resolution (SR) presents us with an outstanding technique of enhancing applications associated with aerial and remote-sensing imagery, hence, tasks like classification, segmentation and object detection can benefit significantly from well-performing SR models. Extensive research is being done in the field of SR for both ground-level and aerial imagery where convolutional neural networks (CNN) have attained incredible progress. Numerous deep CNNs use the attention mechanism in their architectures and one such mechanism is the Squeeze-and-Excitation (SE) inter-channel attention. Although SE block has enhanced the performance of many models, there is no residual mechanism used within its structure. Therefore, in this paper, we propose the Squeeze-and-Residual-Excitation (SRE) attention block. SRE improves upon the SE block by using residual mechanism within its structure to deliver performance gain in the task of SR. Based on our SRE attention mechanism we propose an enhanced SR framework for remote-sensing imagery. We call our model the Squeeze-and-Residual-Excitation Holistic Attention Network (SRE-HAN) that outperforms other attention-based deep SR models for two levels of resolution enhancement: 4x- and 8x-upsampling on two diverse aerial imagery datasets: Satellite Imagery Multi-Vehicles Dataset (SIMD) consisting of 5000 high-resolution (HR) aerial images, and Cars-Overhead-With-Context (COWC). Furthermore, by using YoloV5 object-detection model, we carry out multiple experiments to substantiate the effectiveness of these SR models on the task of object detection on SIMD.
- Conference Article
30
- 10.1109/iceic49074.2020.9051269
- Jan 1, 2020
Deep Learning has successfully solved many computer vision problems sometimes in conjunction with traditional computer vision methods and sometimes by replacing them. In this paper, we aim to solve the problem of object detection by employing different methods from deep learning as well as computer vision. Significant amount of work is done in the domain of generic object detection, where usually objects (foreground) cover majority of image space as compared to background. In this paper we will focus on detecting small objects which constitute a tiny area as compared to background such as aerial imagery where desired objects such as people, cars etc. tend to appear relatively small. Such images have an intrinsic imbalanced class problem because background samples dominate object samples. We propose to use an anchor optimization method which will help reduce unnecessary region proposals as well as it can generate customized anchors depending upon the dataset. It can be used in conjunction with any single stage object detection framework. Its empirically noted that this anchor optimization technique improves accuracy over baseline frameworks.
- Conference Article
38
- 10.1109/dsc.2018.00052
- Jun 1, 2018
In computer vision, significant advances have been made in recent years on object recognition and detection with the rapid development of deep learning, especially deep convolutional neural networks (CNN). The majority of deep learning methods for object detection have been developed for large objects and their performances on small-object detection are not very good. This paper contributes to research in low-resolution small-object detection by evaluating the performances of leading deep learning methods for object detection using a common dataset, which is a new dataset for bird detection, called Little Birds in Aerial Imagery (LBAI), created from real-life aerial imagery data. LBAI contains birds with sizes ranging from 10px to 40px. In our experiments, some of the best deep learning architectures were implemented and applied to LBAI, which include object detection techniques such as YOLOv2, SSH, and Tiny Face, in addition to small instance segmentation techniques including U-Net and Mask R-CNN. Among the object detection methods, experimental results demonstrated that SSH performed the best for easy cases, whereas Tiny Face performed the best for hard cases, i.e. where a cluttered background makes detecting birds difficult. Among small instance segmentation methods, experimental results revealed U-Net achieved slightly better performance than Mask R-CNN.
- Research Article
594
- 10.1016/j.eswa.2021.114602
- Jan 19, 2021
- Expert Systems with Applications
A survey and performance evaluation of deep learning methods for small object detection
- Research Article
9
- 10.3390/s23198118
- Sep 27, 2023
- Sensors (Basel, Switzerland)
In the field of aerial remote sensing, detecting small objects in aerial images is challenging. Their subtle presence against broad backgrounds, combined with environmental complexities and low image resolution, complicates identification. While their detection is crucial for urban planning, traffic monitoring, and military reconnaissance, many deep learning approaches demand significant computational resources, hindering real-time applications. To elevate the accuracy of small object detection in aerial imagery and cater to real-time requirements, we introduce SenseLite, a lightweight and efficient model tailored for aerial image object detection. First, we innovatively structured the YOLOv5 model for a more streamlined structure. In the backbone, we replaced the original structure with cutting-edge lightweight neural operator Involution, enhancing contextual semantics and weight distribution. For the neck, we incorporated GSConv and slim-Neck, striking a balance between reduced computational complexity and performance, which is ideal for rapid predictions. Additionally, to enhance detection accuracy, we integrated a squeeze-and-excitation (SE) mechanism to amplify channel communication and improve detection accuracy. Finally, the Soft-NMS strategy was employed to manage overlapping targets, ensuring precise concurrent detections. Performance-wise, SenseLite reduces parameters by 30.5%, from 7.05 M to 4.9 M, as well as computational demands, with GFLOPs decreasing from 15.9 to 11.2. It surpasses the original YOLOv5, showing a 5.5% mAP0.5 improvement, 0.9% higher precision, and 1.4% better recall on the DOTA dataset. Compared to other leading methods, SenseLite stands out in terms of performance.
- Research Article
3
- 10.5194/isprs-archives-xli-b7-229-2016
- Jun 21, 2016
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Since satellite and aerial imageries are recently widely spread and frequently observed, combination of them are expected to complement spatial and temporal resolution each other. One of the prospective applications is traffic monitoring, where objects of interest, or vehicles, need to be recognized automatically. Techniques that employ object detection before object recognition can save a computational time and cost, and thus take a significant role. However, there is not enough knowledge whether object detection method can perform well on satellite and aerial imageries. In addition, it also has to be studied how characteristics of satellite and aerial imageries affect the object detection performance. This study employ binarized normed gradients (BING) method that runs significantly fast and is robust to rotation and noise. For our experiments, 11-bits BGR-IR satellite imageries from WorldView-3, and BGR-color aerial imageries are used respectively, and we create thousands of ground truth samples. We conducted several experiments to compare the performances with different images, to verify whether combination of different resolution images improved the performance, and to analyze the applicability of mixing satellite and aerial imageries. The results showed that infrared band had little effect on the detection rate, that 11-bit images performed less than 8-bit images and that the better spatial resolution brought the better performance. Another result might imply that mixing higher and lower resolution images for training dataset could help detection performance. Furthermore, we found that aerial images improved the detection performance on satellite images.
- Research Article
1
- 10.5194/isprsarchives-xli-b7-229-2016
- Jun 21, 2016
- ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Since satellite and aerial imageries are recently widely spread and frequently observed, combination of them are expected to complement spatial and temporal resolution each other. One of the prospective applications is traffic monitoring, where objects of interest, or vehicles, need to be recognized automatically. Techniques that employ <i>object detection</i> before <i>object recognition</i> can save a computational time and cost, and thus take a significant role. However, there is not enough knowledge whether object detection method can perform well on satellite and aerial imageries. In addition, it also has to be studied how characteristics of satellite and aerial imageries affect the object detection performance. This study employ binarized normed gradients (BING) method that runs significantly fast and is robust to rotation and noise. For our experiments, 11-bits BGR-IR satellite imageries from WorldView-3, and BGR-color aerial imageries are used respectively, and we create thousands of ground truth samples. We conducted several experiments to compare the performances with different images, to verify whether combination of different resolution images improved the performance, and to analyze the applicability of mixing satellite and aerial imageries. The results showed that infrared band had little effect on the detection rate, that 11-bit images performed less than 8-bit images and that the better spatial resolution brought the better performance. Another result might imply that mixing higher and lower resolution images for training dataset could help detection performance. Furthermore, we found that aerial images improved the detection performance on satellite images.
- Research Article
2
- 10.3390/rs16203753
- Oct 10, 2024
- Remote Sensing
Detecting tiny objects in aerial imagery presents a major challenge regarding their limited resolution and size. Existing research predominantly focuses on evaluating average precision (AP) across various detection methods, often neglecting computational efficiency. Furthermore, state-of-the-art techniques can be complex and difficult to understand. This paper introduces a comprehensive benchmarking analysis specifically tailored for enhancing small object detection within the DOTA dataset, focusing on one-stage detection methods. We propose a novel data-processing approach to enhance the overall AP for all classes in the DOTA-v1.5 dataset using the YOLOv8 framework. Our approach utilizes the YOLOv8’s darknet architecture, a proven effective backbone for object detection tasks. To optimize performance, we introduce innovative pre-processing techniques, including data formatting, noise handling, and normalization, in order to improve the representation of small objects and improve their detectability. Extensive experiments on the DOTA-v1.5 dataset demonstrate the superiority of our proposed approach in terms of overall class mean average precision (mAP), achieving 66.7%. Additionally, our method establishes a new benchmark regarding computational efficiency and speed. This advancement not only enhances the performance of small object detection but also sets a foundation for future research and applications in aerial imagery analysis, paving the way for more efficient and effective detection techniques.
- Book Chapter
2
- 10.1007/978-3-030-03766-6_76
- Dec 25, 2018
In the small object detection on the UAV (Unmanned Aerial Vehicle) platform, the confidence description of the moving object is proposed to improve the accuracy, robustness and reliable tracking method of the object detection. Due to the low resolution and slow motion of small moving object in aerial video, and the image is easily subject to illumination and camera jitter noise, and the correlation between video sequences is neglected, it is prone to false detection of moving object and low detection accuracy, the characteristics of poor robustness. For the UAV video with small moving object, the algorithm uses the ORB operator to extract reliable global feature points for each frame of the video, and then performs global motion compensation on the motion background through the affine transformation model and calculates the difference image. The energy accurately detects the small object, and then describes the confidence of the moving object. The n-step back-off method is used to increase the correlation information between the video sequences. The proposed method is to evaluate the video captured on the airborne aircraft, and has done a lot of experiments and tests. For the object as small as 25 pixels, the method still has better performance, and our method can be realized by parallel computing. Real-time, processing 1280 × 720 frames at around 45 fps.
- Research Article
2
- 10.5194/isprs-archives-xliii-b2-2022-657-2022
- May 30, 2022
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. At the time of continuous development of all technologies, deep machine learning (more precisely, convolutional neural networks), which is one of the branches of artificial intelligence (AI), has found wide application in many fields, including photogrammetry and remote sensing. One of the areas where a lot of research is conducted using these methods is the recognition of objects in aerial and satellite imagery. Through the application of deep learning algorithms and neural networks, it is possible to automate labour-intensive processes. However, while object detection in images using machine learning is popular for natural scenes and in recent years also for nadir aerial and satellite imagery, for aerial oblique imagery at the moment of this research there were relatively few publications on the subject. The challengeable task in object detection is the time-consuming generation of training datasets when access is limited or non-existent. This study proposed the methodology to automate this process with use of existing resources for transferring of references to new databases for training models for detect objects on aerial oblique images. The object detection was performed using the YOLOv3 neural network. Experiment results tested on two datasets have shown that the proposed method could realize the task of object detection in oblique aerial images.
- Research Article
46
- 10.3390/rs11182176
- Sep 18, 2019
- Remote Sensing
Detecting objects in aerial images is a challenging task due to multiple orientations and relatively small size of the objects. Although many traditional detection models have demonstrated an acceptable performance by using the imagery pyramid and multiple templates in a sliding-window manner, such techniques are inefficient and costly. Recently, convolutional neural networks (CNNs) have successfully been used for object detection, and they have demonstrated considerably superior performance than that of traditional detection methods; however, this success has not been expanded to aerial images. To overcome such problems, we propose a detection model based on two CNNs. One of the CNNs is designed to propose many object-like regions that are generated from the feature maps of multi scales and hierarchies with the orientation information. Based on such a design, the positioning of small size objects becomes more accurate, and the generated regions with orientation information are more suitable for the objects arranged with arbitrary orientations. Furthermore, another CNN is designed for object recognition; it first extracts the features of each generated region and subsequently makes the final decisions. The results of the extensive experiments performed on the vehicle detection in aerial imagery (VEDAI) and overhead imagery research data set (OIRDS) datasets indicate that the proposed model performs well in terms of not only the detection accuracy but also the detection speed.
- Research Article
1
- 10.1088/1361-6501/adf136
- Aug 1, 2025
- Measurement Science and Technology
Small object detection presents various challenges across different domains, with UAV aerial image detection being particularly significant and complex. The detection accuracy is primarily influenced by the high density of small objects, substantial object scale variations and background complexity. Nevertheless, existing object detection algorithms exhibit deficiencies in feature retention and multi-scale feature fusion, thereby limiting detection performance in intricate scenes. To address these challenges, this paper proposes an innovative multi-dimensional feature enhancement and multi-scale feature adaptive aggregation and diffusion small object detection network (MFEAD-SODNet) for UAV aerial images. First, a backbone network integrating edge and spatial feature enhancement is developed to enhance feature representation from multiple perspectives, which improves small object recognition accuracy and detection performance. Second, the multi-scale feature adaptive aggregation and diffusion feature pyramid network (MFAD-FPN) is innovatively introduced. This network effectively preserves multi-scale information through adaptive feature fusion driven by channel selection. Additionally, it employs a cross-layer feature aggregation and adjacent layer feature diffusion mechanism to shorten feature transfer paths and minimize information propagation loss. Finally, a Lightweight shared detail-enhanced detection head is proposed to balance computational complexity while enhancing detailed feature representation. To evaluate the effectiveness of the proposed algorithm, experiments were conducted using VisDrone2019 as the baseline dataset. Results indicate that, compared to the baseline model, MFEAD-SODNet improves Mean Average Precision (mAP)@0.5 and mAP@0.5:0.95 by 7.6% and 5.1 %, respectively, while reducing the number of parameters by 23.3 %. Furthermore, the effectiveness and generalization of the MFEAD-SODNet model for small object detection were further validated using additional public and self-built datasets.