Abstract

Abstract. Geo-referenced real-time vehicle and person tracking in aerial imagery has a variety of applications such as traffic and large-scale event monitoring, disaster management, and also for input into predictive traffic and crowd models. However, object tracking in aerial imagery is still an unsolved challenging problem due to the tiny size of the objects as well as different scales and the limited temporal resolution of geo-referenced datasets. In this work, we propose a new approach based on Convolutional Neural Networks (CNNs) to track multiple vehicles and people in aerial image sequences. As the large number of objects in aerial images can exponentially increase the processing demands in multiple object tracking scenarios, the proposed approach utilizes the stack of micro CNNs, where each micro CNN is responsible for a single-object tracking task. We call our approach Stack of Micro-Single- Object-Tracking CNNs (SMSOT-CNN). More precisely, using a two-stream CNN, we extract a set of features from two consecutive frames for each object, with the given location of the object in the previous frame. Then, we assign each MSOT-CNN the extracted features of each object to predict the object location in the current frame. We train and validate the proposed approach on the vehicle and person sets of the KIT AIS dataset of object tracking in aerial image sequences. Results indicate the accurate and time-efficient tracking of multiple vehicles and people by the proposed approach.

Highlights

  • Multi-person and -vehicle tracking has several applications such as large-scale event and traffic monitoring, disaster management, and predictive traffic and crowd modeling

  • The Visual Object Tracking (VOT) methods based on deep learning Convolutional Neural Networks (CNNs) (Girshick et al, 2014, Girshick, 2015, Ren et al, 2015, Lin et al, 2017) have shown promising performances in MOT scenarios (Wojke et al, 2017, Bewley et al, 2016)

  • The training The KIT AIS dataset composed of 9 vehicle tracking sequences configuration is similar to the original GOTURN

Read more

Summary

Introduction

Multi-person and -vehicle tracking has several applications such as large-scale event and traffic monitoring, disaster management, and predictive traffic and crowd modeling. The VOT methods based on deep learning Convolutional Neural Networks (CNNs) (Girshick et al, 2014, Girshick, 2015, Ren et al, 2015, Lin et al, 2017) have shown promising performances in MOT scenarios (Wojke et al, 2017, Bewley et al, 2016). Most of these methods suffer from high computational costs and slow processing, especially extracting features from each candidate object locations in every frame (El-Shafie et al, 2019). In order to employ CNNs for VOT purposes, one approach is to train CNNs as object versus background classifiers in an online manner and apply them to a number of sampled candidate regions, where the region with the highest classification score is selected as the most visually sim-

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.