Abstract

In recent years, deep learning methods have achieved great success for vehicle detection tasks in aerial imagery. However, most existing methods focus only on extracting latent vehicle target features, and rarely consider the scene context as vital prior knowledge. In this letter, we propose a scene context attention-based fusion network (SCAF-Net), to fuse the scene context of vehicles into an end-to-end vehicle detection network. First, we propose a novel strategy, patch cover, to keep the original target and scene context information in raw aerial images of a large scale as much as possible. Next, we use an improved YOLO-v3 network as one branch of SCAF-Net, to generate vehicle candidates on each patch. Here, a novel branch for the scene context is utilized to extract the latent scene context of vehicles on each patch without any extra annotations. Then, these two branches above are concatenated together as a fusion network, and we apply an attention-based model to further extract vehicle candidates of each local scene. Finally, all vehicle candidates of different patches, are merged by global nonmax suppress (g-NMS) to output the detection result of the whole original image. Experimental results demonstrate that our proposed method outperforms the comparison methods with both high detection accuracy and speed. Our code is released at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/minghuicode/SCAF-Net</uri> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call