Abstract

In the last couple of decades, Convolution Neural Network (CNN) emerged as the most active field of research. There are a number of applications of CNN, and its architectures are used for the improvement of accuracy and efficiency in various fields. In this paper, we aim to use CNN in order to generate fusion of visible and thermal camera images to detect persons present in those images for a reliable surveillance application. There are various kinds of image fusion methods to achieve multi-sensor, multi-modal, multi-focus and multi-view image fusion. Our proposed methodology includes Encoder-Decoder architecture for fusion of visible and thermal images and ResNet-152 architecture for classification of images to detect if there is a person present in the image or not. Korea Advanced Institute of Science and Technology (KAIST) multispectral dataset consisting of 95,000 visible and thermal images is used for training of CNNs. During experimentation, it is observed that fused architecture outperforms individual visible and thermal based architectures, where fused architecture gives 99.2% accuracy while visible gives 97% and thermal gives 97.6% accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call