SODA: A Dataset for Small Object Detection in UAV Captured Imagery

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Modern Unmanned Aerial Vehicles (UAVs) are equipped with high-resolution cameras that can capture imagery and video of whatever lies in their field of view. However, depending on the altitude of the UAV, objects within the imagery appear small, sometimes covering just a few pixels. This makes the detection of such objects very challenging. The SODA (Small Objects at Different Altitudes) dataset was created to assist in research dealing with small object detection from aerial imagery. The dataset contains 377 aerial images captured using UAVs at altitudes ranging between 5m and 30 m. The dataset is then used to evaluate the performance of YOLOv8. The SODA dataset is publicly available on GitHub<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>.

Similar Papers
  • Conference Article
  • Cite Count Icon 995
  • 10.1109/wacv.2014.6836101
Beyond PASCAL: A benchmark for 3D object detection in the wild
  • Mar 1, 2014
  • Yu Xiang + 2 more

3D object detection and pose estimation methods have become popular in recent years since they can handle ambiguities in 2D images and also provide a richer description for objects compared to 2D object detectors. However, most of the datasets for 3D recognition are limited to a small amount of images per category or are captured in controlled environments. In this paper, we contribute PASCAL3D+ dataset, which is a novel and challenging dataset for 3D object detection and pose estimation. PASCAL3D+ augments 12 rigid categories of the PASCAL VOC 2012 [4] with 3D annotations. Furthermore, more images are added for each category from ImageNet [3]. PASCAL3D+ images exhibit much more variability compared to the existing 3D datasets, and on average there are more than 3,000 object instances per category. We believe this dataset will provide a rich testbed to study 3D detection and pose estimation and will help to significantly push forward research in this area. We provide the results of variations of DPM [6] on our new dataset for object detection and viewpoint estimation in different scenarios, which can be used as baselines for the community. Our benchmark is available online at http://cvgl.stanford.edu/projects/pascal3d.

  • Research Article
  • Cite Count Icon 25
  • 10.1016/j.nhres.2022.10.002
Object detection in high resolution optical image based on deep learning technique
  • Oct 12, 2022
  • Natural Hazards Research
  • Wenwen Qi

Object detection in high resolution optical image based on deep learning technique

  • Research Article
  • 10.1016/s0923-5965(02)00074-7
Special Issue on Recent Advances in Wireless Video SIGNAL PROCESSING: IMAGE COMMUNICATION
  • Sep 1, 2002
  • Signal Processing: Image Communication
  • Yucel Altunbasak + 3 more

Special Issue on Recent Advances in Wireless Video SIGNAL PROCESSING: IMAGE COMMUNICATION

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-031-25825-1_2
UnseenNet: Fast Training Detector for Unseen Concepts with No Bounding Boxes
  • Jan 1, 2023
  • Asra Aslam + 1 more

Training of object detection models using less data is currently the focus of existing N-shot learning models in computer vision. Such methods use object-level labels and takes hours to train on unseen classes. There are many cases where we have large amount of image-level labels available for training and cannot be utilized by few shot object detection models for training. There is a need for a machine learning framework that can be used for training any unseen class and can become useful in real-time situations. In this paper, we proposed an “Unseen Class Detector” that can be trained within a short time for any possible unseen class without bounding boxes with competitive accuracy. We build our approach on “Strong” and “Weak” baseline detectors, which we trained on object detection and image classification datasets, respectively. Unseen concepts are fine-tuned on the strong baseline detector using only image-level labels and further adapted by transferring the classifier-detector knowledge between baselines. We use semantic as well as visual similarities to identify the source class (i.e. Sheep) for the fine-tuning and adaptation of unseen class (i.e. Goat). Our model (UnseenNet) is trained on the ImageNet classification dataset for unseen classes and tested on an object detection dataset (OpenImages). UnseenNet improves the mean average precision (mAP) by 10% to 30% over existing baselines (semi-supervised and few-shot) of object detection. Moreover, training time of proposed model is \(<10\) min for each unseen class.KeywordsWeakly supervised learningObject detectionTransfer learningDomain adaptationComputer vision

  • Research Article
  • Cite Count Icon 213
  • 10.1016/j.eswa.2022.116793
Remote sensing image super-resolution and object detection: Benchmark and state of the art
  • Mar 2, 2022
  • Expert Systems with Applications
  • Yi Wang + 7 more

Remote sensing image super-resolution and object detection: Benchmark and state of the art

  • Conference Article
  • Cite Count Icon 11
  • 10.1109/iccc54389.2021.9674247
Small Moving Object Detection and Tracking Based on Event Signals
  • Dec 10, 2021
  • Yuanjun Shu + 4 more

Detection and tracking small moving objects with high maneuverability, such as drones and missiles, is challenging. In order to detect and track moving objects, the cameras often need to rotate with the movement of the target. When using RGB cameras, problems of motion blur, information redundancy, and highly computational cost occur. The event camera is a new type of camera that only outputs moving target signals by judging the intensity changes of pixel values, reducing data redundancy, and is very suitable for detecting and tracking small objects with high maneuverability. However, when the event camera rotates, the texture edges of the background are also collected, which seriously affects the target object detection. To this end, this paper proposes an event-based moving object detection and tracking method based on registration and foreground enhancement models. Due to the lack of event-based small object detection datasets, we also made a dataset for small object detection. Extensive experiments show that our proposed method can effectively detect small moving object with high maneuverability.

  • Research Article
  • Cite Count Icon 7
  • 10.1049/ipr2.12288
Integration of gradient guidance and edge enhancement into super‐resolution for small object detection in aerial images
  • Jun 22, 2021
  • IET Image Processing
  • Jinzhen Mu + 3 more

Detecting small objects are difficult because of their poor‐quality appearance and small size, and such issues are especially pronounced for aerial images of great importance. To address the small object detection (SOD) problem, a united architecture that tries to upsample small objects into super‐resolved versions, achieving characteristics similar to those large objects and thus resulting in more discriminative detection is used. For this purpose, a new end‐to‐end multi‐task generative adversarial network (GAN) is proposed. In the architecture, the generator is a super‐resolution (SR) network, and the discriminator is a multi‐task network. In the generator, a gradient guide and an edge‐enhancement strategy are introduced to alleviate structural distortions. In the discriminator, a faster region‐based convolutional neural network (FRCNN) is incorporated for the task of object detection. Specifically, the discriminator outputs a distribution scalar to measure the realness. Then, each super‐resolved image passes through the discriminator with a realness distribution, classification scores, and bounding box regression offsets. Furthermore, the losses of the detection task are backpropagated into the generator during training rather than being optimized independently. Extensive experiments on the challenging cars overhead with context dataset (COWC), detectIon in optical remote sensing images (DIOR), vision meets drones (VisDrone), and dataset for object detection in aerial images (DOTA) demonstrate the effectiveness of the proposed method in reconstructing structures while generating natural super‐resolved images and show the superiority of the proposed method in detecting small objects over state‐of‐the‐art detectors.

  • Research Article
  • Cite Count Icon 81
  • 10.1016/j.jvcir.2023.103830
Rethinking PASCAL-VOC and MS-COCO dataset for small object detection
  • Apr 29, 2023
  • Journal of Visual Communication and Image Representation
  • Kang Tong + 1 more

Rethinking PASCAL-VOC and MS-COCO dataset for small object detection

  • Research Article
  • Cite Count Icon 4
  • 10.31272/jeasd.2682
Improving Tiny Object Detection in Aerial Images with Yolov5
  • Jan 1, 2025
  • Journal of Engineering and Sustainable Development
  • Ahmed Sharba + 1 more

Object detection is a major area of computer vision work, particularly for aerial surveillance and traffic control applications, where detecting vehicles from aerial images is essential. However, such images often lack semantic detail and struggle to identify small, densely packed objects accurately. This paper proposes improvements to the You Only Look Once version 5 (YOLOv5) model to enhance small object detection. Key modifications include adding a new prediction head with a 160×160 feature map, replacing the Sigmoid Linear Unit (SiLU) activation function with the Exponential Linear Unit (ELU), and swapping the Spatial Pyramid Pooling – Fast (SPPF) module with the Spatial Pyramid Pooling (SPP) module. The enhanced model was tested on two datasets: Dataset for Object Detection in Aerial Images (DOTA) v1.5 and CarJet, which focused on vehicle and plane detection. Results showed a 7.1% increase in mean Average Precision (mAP) on the DOTA dataset and a 2.3% improvement on the CarJet dataset, measured with an Intersection over Union (IoU) threshold of 0.5. These architectural changes to YOLOv5 notably improve small object detection accuracy, offering valuable potential for aerial surveillance and traffic control tasks.

  • Research Article
  • Cite Count Icon 51
  • 10.1016/j.isprsjprs.2022.11.008
Manipal-UAV person detection dataset: A step towards benchmarking dataset and algorithms for small object detection
  • Nov 28, 2022
  • ISPRS Journal of Photogrammetry and Remote Sensing
  • Akshatha K.R + 5 more

Manipal-UAV person detection dataset: A step towards benchmarking dataset and algorithms for small object detection

  • Research Article
  • Cite Count Icon 3
  • 10.3390/rs17213637
Infrared-Visible Image Fusion Meets Object Detection: Towards Unified Optimization for Multimodal Perception
  • Nov 4, 2025
  • Remote Sensing
  • Xiantai Xiang + 8 more

Infrared-visible image fusion and object detection are crucial components in remote sensing applications, each offering unique advantages. Recent research has increasingly sought to combine these tasks to enhance object detection performance. However, the integration of these tasks presents several challenges, primarily due to two overlooked issues: (i) existing infrared-visible image fusion methods often fail to adequately focus on fine-grained or dense information, and (ii) while joint optimization methods can improve fusion quality and downstream task performance, their multi-stage training processes often reduce efficiency and limit the network’s global optimization capability. To address these challenges, we propose the UniFusOD method, an efficient end-to-end framework that simultaneously optimizes both infrared-visible image fusion and object detection tasks. The method integrates Fine-Grained Region Attention (FRA) for region-specific attention operations at different granularities, enhancing the model’s ability to capture complex information. Furthermore, UnityGrad is introduced to balance the gradient conflicts between fusion and detection tasks, stabilizing the optimization process. Extensive experiments demonstrate the superiority and robustness of our approach. Not only does UniFusOD achieve excellent results in image fusion, but it also provides significant improvements in object detection performance. The method exhibits remarkable robustness across various tasks, achieving a 0.8 and 1.9 mAP50 improvement over state-of-the-art methods on the DroneVehicle dataset for rotated object detection and the M3FD dataset for horizontal object detection, respectively.

  • Research Article
  • Cite Count Icon 4
  • 10.1051/itmconf/20235402006
Simulated uav dataset for object detection
  • Jan 1, 2023
  • ITM Web of Conferences
  • Avinash Kaur Sama + 1 more

Unmanned Aerial Vehicles (UAVs) have become increasingly popular for various applications, including object detection. Novel detector algorithms require large datasets to improve, as they are still evolving. Additionally, in countries with restrictive drone policies, simulated datasets can provide a cost-effective and efficient alternative to real-world datasets for researchers to develop and test their algorithms in a safe and controlled environment. To address this, we propose a simulated dataset for object detection through a Gazebo simulator that covers both indoor and outdoor environments. The dataset consists of 11,103 annotated frames with 27,412 annotations, of persons and cars as the objects of interest. This dataset can be used to evaluate detector proposals for object detection, providing a valuable resource for researchers in the field. The dataset is annotated using the Dark Label software, which is a popular tool for object annotation. Additionally, we assessed the dataset’s performance using advanced object detection systems, with YOLOv3 achieving 86.9 mAP50-95, YOLOv3-tiny achieving 79.5 mAP50-95, YOLOv5 achieving 82.2 mAP50-95, YOLOv7 achieving 61.8 mAP50-95 and YOLOv8 achieving 87.8 mAP50-95. Overall, this simulated dataset is a valuable resource for researchers working in the field of object detection.

  • Research Article
  • Cite Count Icon 46
  • 10.1016/j.asej.2023.102387
An effective obstacle detection system using deep learning advantages to aid blind and visually impaired navigation
  • Jul 16, 2023
  • Ain Shams Engineering Journal
  • Ahmed Ben Atitallah + 5 more

An effective obstacle detection system using deep learning advantages to aid blind and visually impaired navigation

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.dibe.2024.100383
Precast concrete project image dataset for deep learning object detection
  • Feb 28, 2024
  • Developments in the Built Environment
  • Jun Young Jang + 4 more

Precast concrete project image dataset for deep learning object detection

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.procs.2020.09.215
Automatic Construction of Dataset with Automatic Annotation for Object Detection
  • Jan 1, 2020
  • Procedia Computer Science
  • Naok Watanabe + 5 more

Automatic Construction of Dataset with Automatic Annotation for Object Detection

Save Icon
Up Arrow
Open/Close