Abstract

In Computer vision object detection and classification are active fields of research. Applications of object detection and classification include a diverse range of fields such as surveillance, autonomous cars and robotic vision. Many intelligent systems are built by researchers to achieve the accuracy of human perception but could not quite achieve it yet. Convolutional Neural Networks (CNN) and Deep Learning architectures are used to achieve human like perception for object detection and scene identification. We are proposing a novel method by combining previously used techniques. We are proposing a model which takes multi-spectral images, fuses them together, drops the useless images and then provides semantic segmentation for each object (person) present in the image. In our proposed methodology we are using CNN for fusion of Visible and thermal images and Deep Learning architectures for classification and localization. Fusion of visible and thermal images is carried out to combine informative features of both images into one image. For fusion we are using Encoder-decoder architecture. Fused image is then fed into Resnet-152 architecture for classification of images. Images obtained from Resnet-152 are then fed into Mask-RCNN for localization of persons. Mask-RCNN uses Resnet-101 architecture for localization of objects. From the results it can be clearly seen that Fused model for object localization outperforms the Visible model and gives promising results for person detection for surveillance purposes. Our proposed model gives the Miss Rate of 5.25% which is much better than the previous state of the art method applied on KAIST dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call