Abstract

Crime generates significant losses, both human and economic. Every year, billions of dollars are lost due to attacks, crimes, and scams. Surveillance video camera networks generate vast amounts of data, and the surveillance staff cannot process all the information in real-time. Human sight has critical limitations. Among those limitations, visual focus is one of the most critical when dealing with surveillance. For example, in a surveillance room, a crime can occur in a different screen segment or on a distinct monitor, and the surveillance staff may overlook it. Our proposal focuses on shoplifting crimes by analyzing situations that an average person will consider as typical conditions, but may eventually lead to a crime. While other approaches identify the crime itself, we instead model suspicious behavior—the one that may occur before the build-up phase of a crime—by detecting precise segments of a video with a high probability of containing a shoplifting crime. By doing so, we provide the staff with more opportunities to act and prevent crime. We implemented a 3DCNN model as a video feature extractor and tested its performance on a dataset composed of daily action and shoplifting samples. The results are encouraging as the model correctly classifies suspicious behavior in most of the scenarios where it was tested. For example, when classifying suspicious behavior, the best model generated in this work obtains precision and recall values of 0.8571 and 1 in one of the test scenarios, respectively.

Highlights

  • According to the 2018 National Retail Security Survey (NRSS) [1] inventory shrink, a loss of inventory related to theft, shoplifting, error or fraud, had an impact of $46.8 billion in 2017 on U.S retail economy

  • Among the main contributions of this work, we propose a method to extract segments from videos that feed a model based on a 3D Convolutional Neural Network (3DCNN) and learns to classify suspicious behavior

  • The model achieves an accuracy of 75% on suspicious behavior detection before committing a crime on a dataset composed of daily-action samples and shoplifting samples

Read more

Summary

Introduction

According to the 2018 National Retail Security Survey (NRSS) [1] inventory shrink, a loss of inventory related to theft, shoplifting, error or fraud, had an impact of $46.8 billion in 2017 on U.S retail economy. Vigilance camera networks are generating vast amounts of video screens, and the surveillance staff cannot process all the available information. Real-time analysis of each camera has become an exhaustive task due to human limitations. The primary human limitation is the Visual Focus of Attention (VFOA) [2]. Optical focus is a significant human-related disadvantage in the surveillance context. A crime can occur in a different screen segment or on a different monitor, and the staff may not notice it. Every surveillance environment must satisfy with a particular set of requirements. Those requirements have promoted the creation of specialized tools, both on equipment and on software, to support the surveillance task. Prevention and reaction are two primary aims in the surveillance context. The security teams take action only after the crime or event has taken place

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call