Abstract

When constructing a deep learning model for recognizing violence inside a vehicle, it is crucial to consider several aspects. One aspect is the computational limitations, and the other is the deep learning model architecture chosen. Nevertheless, to choose the best deep learning model, it is necessary to test and evaluate the model against adversarial attacks. This paper presented three different architecture models for violence recognition inside a vehicle. These model architectures were evaluated based on adversarial attacks and interpretability methods. An analysis of the model’s convergence was conducted, followed by adversarial robustness for each model and a sanity-check based on interpretability analysis. It compared a standard evaluation for training and testing data samples with the adversarial attacks techniques. These two levels of analysis are essential to verify model weakness and sensibility regarding the complete video and in a frame-by-frame way.

Highlights

  • Violence recognition is a sub-area of human action recognition that can be divided between internal and external environments [1]

  • It is generally not feasible to include the audio signal in outdoor surveillance, but it can be included indoors

  • This paper presents three models for in-car violence recognition and evaluates these models based on adversarial attacks and interpretability methods

Read more

Summary

Introduction

Violence recognition is a sub-area of human action recognition that can be divided between internal and external environments [1]. This is a crucial distinction, as the issues and problems to be addressed in these two types of environments are quite different. Internal surveillance, where the capture and adequate audio signal filtering is straightforward, can help obtain better results [2]. These audio and video signals can go through a multimodal fusion process to increase the success rate [3]. Recent studies include audio since microphones can pick up audio, being very powerful sensors that capture context and human behaviour [2]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call