Multistage approach for automatic target detection and recognition in infrared imagery using deep learning

Nada Baili,Mahdi Moalla,Hichem Frigui,Andrew D Karem

doi:10.1117/1.jrs.16.048505

Abstract

Automatic target recognition (ATR) is a challenging task for several computer vision applications. It requires efficient, accurate, and robust methods for target detection and target identification. Deep learning has shown great success in many computer vision applications involving color RGB images. However, the performance of these networks in ATR with infrared sensor data needs further investigation. In this paper, we propose a multistage automatic target detection and recognition (ATDR) system that performs both target detection and target classification on infrared (IR) imagery using deep learning. Our system processes large IR image frames where targets take <1 % of the total number of pixels. First, we train a state-of-the-art object detector you only look once (YOLO) to localize all potential targets in the input image frame. Then, we train a convolutional neural network (CNN) to identify these detections as targets or false alarms. In this second phase, we adapt and analyze the performance of three CNN architectures: a compact and fully connected CNN, VGG16 with batch normalization, and a wide residual neural network (WRN). We also explore the use of a loss function that optimizes directly the area under the receiver operating characteristic (ROC) curve (AUC), and adapt it to our ATR application. To enhance the robustness of the proposed ATR to perturbation and variations introduced during the detection stage, we train our CNN classifiers on automatically detected targets using YOLO, in addition to ground truth bounding boxes and apply selected data augmentation techniques. To simulate real testing environments, where the spatial location of the targets within the image frame is unknown, only YOLO-detected boxes are used during validation. We evaluate our ATDR on a real benchmark dataset that includes different vehicles captured at different resolutions. Our experiments have shown that YOLO can detect most of the targets at the expense of generating a high number of false alarms. We show that the VGG-16 network with batch normalization, which is the best performing model, can correctly identify the classes of the targets, as well as classify the majority of YOLO’s false detections into an additional nontarget class. We also show that the proposed training modification to optimize an AUC-based loss function for ATR proved to be advantageous mainly in identifying difficult targets.

Full Text