Abstract
Effective detection of abnormal behaviors within elevator cabins is critical to ensure elevator safety. While existing deep learning based anomaly detection methods mainly focus on convolutional neural networks for spatial feature extraction and recurrent networks for temporal feature learning, recent advancements in the Transformer architecture have demonstrated its power in time series predictions, and extended its capabilities to vision detection tasks. In this study, we present a duel transformer-based framework that can proficiently detect falling and fighting events in elevator cabs. The proposed solution leverages the vision transformer (ViT) to extract frame-level spatial features, followed by a temporal Transformer to identify abnormalities in surveillance videos. A comprehensive comparison between the proposed transformer-based method and other traditional recurrent neural network variants is carried out to validate the effectiveness of the method.
Published Version (
Free)
Join us for a 30 min session where you can share your feedback and ask us any queries you have