Deep Attention Neural Network for Multi-Label Classification in Unmanned Aerial Vehicle Imagery

Aaliyah Alshehri,Naif Alajlan,Nassim Ammour,Yakoub Bazi,Haidar Almubarak

doi:10.1109/access.2019.2936616

Aaliyah Alshehri, Naif Alajlan + Show 3 more

Open Access

PDF Available

https://doi.org/10.1109/access.2019.2936616

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 29	License type: CC BY 4.0

Affiliation: King Saud University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The multi-label classification problem in Unmanned Aerial Vehicle (UAV) images is particularly challenging compared to single-label classification due to its combinatorial nature. To tackle this issue, we propose in this paper a deep learning approach based on encoder-decoder neural network architecture with channel and spatial attention mechanisms. Specifically, the encoder module which is based on a pre-trained convolutional neural network (CNN) has the task to transform the input image to a set of feature maps using an opportune feature combination. To improve the feature representation further, this module incorporates a squeeze excitation (SE) layer for modelling the interdependencies between the channels of the feature maps. The decoder module which is based on a long short terms memory (LSTM) network has the task of generating, in a sequential way, the classes present in the image. At each time step, it predicts the next class-label by aligning its hidden state to the corresponding region in the image by means of an adaptive spatial attention mechanism. The experiments carried out on two UAV datasets with a spatial resolution of 2-cm show that our method is promising in predicting the labels present in the image while attending the relevant objects in the image. Additionally, it is able to provide better classification results compared to state-of-the-art methods.

Highlights

The increase adoption of unmanned aerial vehicles (UAVs), commonly known as drones have proven their effectiveness in collecting images with extremely high spatial details over inaccessible areas and limited coverage zones due to their small size and fast deployment
We propose an alternative solution based on encoder-decoder neural network architecture with channel and spatial attention mechanisms
We evaluated the proposed attention network on two UAV datasets acquired over the faculty of science of the University of Trento (Italy) and near the city of Civezzano (Italy) on October 2011 and 2012 by means of a UAV equipped with imaging sensors spanning the visible range (Figure 4)

Summary

Introduction

The increase adoption of unmanned aerial vehicles (UAVs), commonly known as drones have proven their effectiveness in collecting images with extremely high spatial details over inaccessible areas and limited coverage zones due to their small size and fast deployment. As the network goes deeper, it uses high dimensional representations using two inception modules of type C referred as 2×Inception Module C (Figure 1) This network includes more improvements in the architecture compared to the original GoogLeNet network (inception-v1) which was the winner of the ILSVRC14 (ImageNet Large Scale Visual Recognition Competition). These improvements include; 1) the RMSProp optimizer, 2) Factorized 7 × 7 convolutions, 3) BatchNormalization in the auxiliary classifiers, and 4) Label Smoothing, which is a type of a regularizing component added to the loss formula that prevents the network from becoming too confident about a class and prevents overfitting

Objectives

Methods

Findings

Conclusion