Abstract

In the defence and security domain, camera systems are widely used for surveillance. The major advantage of using camera systems for surveillance is that they provide high-resolution imagery, which is easy to interpret. However, the use of camera systems and optical imagery has some drawbacks, especially for application in the military domain. In poor lighting conditions, dust or smoke the image quality degrades and, additionally, cameras cannot provide range information too. These drawbacks can be mitigated by exploiting the strengths of radar. Radar performance can be largely maintained during the night, in various weather conditions and in dust and smoke. Moreover, radar provides the distance to detected objects. Since, the strongpoints and weaknesses of radar and camera systems seem complementary, a natural question is: can radar and camera systems learn from each other? Here the potential of radar/video multimodal learning is evaluated for human activity classification. The novelty of this work is the use of radar spectrograms and related video frames for classification with a multimodal neural network. Radar spectrograms and video frames are both two-dimensional images, but the information they contain is of different nature. This approach was adopted to limit the required preprocessing load, while maintaining the complementary nature of the sensor data.

Highlights

  • Human activity classification is a major asset in the defence and security domain

  • Cameras are widely used for surveillance; camera systems can be found in cities, in shopping centres, in parking garages, in public transportation, on airports etc

  • This widespread use of cameras in the civil domain is motivated by their ease of use and the fact that optical images are easy to interpret for humans, avoiding the need for extended operator training

Read more

Summary

| INTRODUCTION

Human activity classification is a major asset in the defence and security domain. The activity or behaviour a person exhibits may (partly) reveal their intent. These results were produced with a CNN trained to separate the two classes N and R for the unimodal CNN architecture as stated in Section 3.1 (right two columns) and a CNN architecture with a larger last convolutional layer (left two columns). In case of the test subject holding the object, the response to the torso has high saliency This evaluation confirms the notion that the arm motion or lack thereof is the major feature to distinguish a strolling person from a person holding an object in both hands in radar spectrograms (given that a person just strolling typically swings his/her arms).

| Discussion
Findings
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call