Abstract

This article addresses the highly challenging problem of vehicle detection from high-resolution remote sensing imagery by introducing a novel medium size annotated dataset named satellite imagery multivehicles dataset (SIMD) along with an adapted single pass deep multiscale object detection framework with the aim to detect multisized/type objects for catering above-ground perspective of vehicles. The dataset images are acquired from multiple locations in the EU/US regions available in the public Google Earth satellite imagery. Specifically, it comprises 5000 images of resolution 1024 × 768 and collectively contains 45 096 objects in 15 different classes of vehicles including cars, trucks, buses, long vehicles, various types of aircrafts, and boats. In the proposed architecture, we demonstrate the relevant modifications needed to translate the state-of-the-art object detection frameworks to solve the object detection problem from remote sensing imagery. The proposed architecture has been evaluated on SIMD and a public dataset VEDAI. The comparative analysis has been performed with existing off-the-shelf single-shot object detection models including YOLO and YOLT yielding superior performance measured with standard evaluation strategies. To ignite further research in this domain, the introduced SIMD dataset and the corresponding architecture is publicly available at this link: http://vision.seecs.edu.pk/simd .

Highlights

  • M OVABLE object detection in aerial or satellite imagery is of great practical interest owing to its variety of applications in numerous fields including traffic monitoring, airport surveillance, parking lot analysis, search and rescue (SAR), determining transportation infrastructure, etc

  • Tayara and Chong [23] used pyramid styled convolutional neural networks (CNNs) built on multiple backbone networks including VGG-16, Resnet-50, and Resnet-101 stacked with feature maps for the purpose of object detection

  • Data Annotation Most of the current object detection models work on horizontal bounding boxes such as Faster R-CNN [3] and SSD [4] for object detection, we choose to annotate our dataset on the same method and images has been annotated in plain horizontal and vertical rectangles instead of oriented bounding boxes

Read more

Summary

INTRODUCTION

M OVABLE object detection in aerial or satellite imagery is of great practical interest owing to its variety of applications in numerous fields including traffic monitoring, airport surveillance, parking lot analysis, search and rescue (SAR), determining transportation infrastructure, etc. The task of object detection relied on appearance-based handcrafted features encapsulating geometric and structural attributes pertaining to information related to color, texture, shape, etc Later, these features are fed to a typical machine learning classifier such as support vector machine and random forests to detect the item of interest. A large-scale dataset for vehicles detection with annotations in three commonly used formats has been presented which consists of 5000 satellite images of around 45 000 vehicular objects categorized in 15 dedicated classes Such diversity of vehicle appearances will allow to make further progress in the field of automatic scene analysis, scene surveillance, and target detection. The use of Google Earth platform allows the flexibility to download high-resolution RGB images with predefined viewpoint and altitude making them suitable to acquire drone-like imagery and the annotations of the images can be used in a relatively more cross-platform independent manner

RELATED WORK
Available Datasets
CNN Models for Aerial Object Detection
Data Collection
Annotation Formats
Characteristics of Data
Brief Introduction to CNNs
Proposed Model
Experiments and Model Fine Tuning
Training Details
EXPERIMENTAL EVALUATIONS
Dataset Evaluations
Qualitative Evaluations
Ablation Study
DISCUSSION
Varying Multisized Objects
Preprocessing and Data Augmentation
Fine-Grained Classification
Findings
Deeper Networks
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call