Temporal feature enhancement network with external memory for live-stream video object detection

Masato Fujitake,Akihiro Sugimoto

doi:10.1016/j.patcog.2022.108847

Abstract

This paper proposes a method exploiting temporal context with an attention mechanism for detecting objects in real-time in a live streaming video. Video object detection is challenging and essential in practical applications such as robotics, smartphones, and surveillance cameras. Although methods have been proposed to improve the accuracy or run-time speed by exploiting temporal information, the trade-off between them tends to be ignored. We thus focus on the trade-off between accuracy and speed, and propose a method to improve the accuracy by aggregating the past information from a lightweight feature extractor with an attention mechanism. Evaluations on the UA-DETRAC and ImageNet VID datasets demonstrate our model’s superior performance to state-of-the-art methods on live streaming real-time object detection.

Full Text