Abstract

An autoencoder for video anomaly detection task is a type of algorithm with the primary purpose of learning an “informative” representation of the normal data that can be used for identifying the abnormal data by learning to reconstruct a set of input observations. Based on the encoding–decoding structure, we explore a novel dual ForkNet architecture that can dissociate and process the spatio-temporal representation. It is well-known in the information theory community that most autoencoders coding processes are inevitably accompanied by a certain loss of information. In this dual ForkNet, we focus on mitigating the information loss problem and propose a novel architectural recalibration approach, which we term the “Informetrics Recalibration” (IR). It can adaptively recalibrate latent feature representation by explicitly modeling the similarity between the corresponding feature maps of encoder and decoder, and retain more useful semantic information to generate greater differentiation between normal and abnormal events. Additionally, because the structure of the autoencoder itself determines the difficulty to obtain deep semantic information, we introduce a Secondary Encoder (SE) in each ForkNet, so as to recalibrate target features responses of latent feature representation. Our model is easy to be trained and robust to be applied, because it basically consists of some ResNet blocks without using complicated modules. Extensive experiments on the five publicly available benchmarks show that our model outperforms the existing state-of-the-art architectures, demonstrating our framework’s effectiveness.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call